My name is Michael Vigdorchik (314980574), computer science student who is interested in the abundant world of data science and deep learning.
In this semester I decided to participate in the data science workshop course and seize the opprotunity to learn this topic by hands on experience approach. Along the course I planned to build my final project on the topic of medicine. While searching Kaggle, I encountered many datasets that refer to medical conditions but out all that I have seen, I chose the Diabetic Retinopathy Dataset for my project.
Diabetic Retinopathy is an eye disease that leads to blindness, and is most common amongst people with diabetes. Diabetic retinopathy is a condition that occurs with changes in the blood vessels of the retina. There are two stages of diabetic retinopathy:
Anyone with diabetes is at risk for diabetic retinopathy. The longer you have diabetes, the more likely you are to develop diabetic retinopathy.
Your risk rises if you have diabetes and you also smoke, have high blood pressure, or are pregnant.
More information on the condition can be found here: https://www.hopkinsmedicine.org/health/conditions-and-diseases/diabetes/diabetic-retinopathy
While manually examining this dataset of images , I realized that I would want to construct a neural network that would attempt to diagnose the severity of the condition according to some essential features of the disease that may be present in the image of the eye ball.
Then calssify the condition to the most likely severity.
The severity is categorized to 5 labels:
After reading further about the condition, and inspecting the images, (from a computer science student without any medical knowledge obviously) I concluded that it will be a challenging task to diagnose a mild or even a moderate phase of the condition.
According to Wikipedia: Diabetic retinopathy affects up to 80 percent of those who have had diabetes for 20 years or more.
At least 90% of new cases could be reduced with proper treatment and monitoring of the eyes ...
Each year in the United States, diabetic retinopathy accounts for 12% of all new cases of blindness.
It is also the leading cause of blindness in people aged 20 to 64.
This gave me more motivation to fulfill my project on this topic and attempt to lay the foundation of a Convolutional Neural Network model to function as a diagnosis tool.
The dataset consists of color images of left,right eye pairs. Each pair of images represents a case of DR. (One of the 5 cases mentioned above).
The structure of the dataset is quite simple.
Link to dataset: https://www.kaggle.com/datasets/tanlikesmath/diabetic-retinopathy-resized
! pip install kaggle
! mkdir ~/.kaggle
! cp kaggle.json ~/.kaggle/
! chmod 600 ~/.kaggle/kaggle.json
! kaggle datasets download -d tanlikesmath/diabetic-retinopathy-resized
Warning: Your Kaggle API key is readable by other users on this system! To fix this, you can run 'chmod 600 /root/.kaggle/kaggle.json' Downloading diabetic-retinopathy-resized.zip to /content 100% 7.23G/7.25G [00:52<00:00, 148MB/s] 100% 7.25G/7.25G [00:53<00:00, 147MB/s]
! unzip -q diabetic-retinopathy-resized
! mv /content/trainLabels_cropped.csv /content/resized_train_cropped
! mv /content/trainLabels.csv /content/resized_train
import numpy as np
import os
from pprint import pprint
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import cv2
from tqdm import tqdm
TRAIN_IMAGES_PATH = '/content/resized_train_cropped/resized_train_cropped'
Load the csv file with the DR condition severity scores to a pandas dataframe
train_df = pd.read_csv('/content/resized_train_cropped/trainLabels_cropped.csv')
train_df.head(10)
| Unnamed: 0 | Unnamed: 0.1 | image | level | |
|---|---|---|---|---|
| 0 | 0 | 0 | 10_left | 0 |
| 1 | 1 | 1 | 10_right | 0 |
| 2 | 2 | 2 | 13_left | 0 |
| 3 | 3 | 3 | 13_right | 0 |
| 4 | 4 | 4 | 15_left | 1 |
| 5 | 5 | 5 | 15_right | 2 |
| 6 | 6 | 6 | 16_left | 4 |
| 7 | 7 | 7 | 16_right | 4 |
| 8 | 8 | 8 | 17_left | 0 |
| 9 | 9 | 9 | 17_right | 1 |
Get rid of the 2 first columns. They have unnecessary information.
train_df.drop(['Unnamed: 0', 'Unnamed: 0.1'], axis=1, inplace=True)
train_df.head(10)
| image | level | |
|---|---|---|
| 0 | 10_left | 0 |
| 1 | 10_right | 0 |
| 2 | 13_left | 0 |
| 3 | 13_right | 0 |
| 4 | 15_left | 1 |
| 5 | 15_right | 2 |
| 6 | 16_left | 4 |
| 7 | 16_right | 4 |
| 8 | 17_left | 0 |
| 9 | 17_right | 1 |
train_df.describe()
| level | |
|---|---|
| count | 35108.000000 |
| mean | 0.525863 |
| std | 0.970372 |
| min | 0.000000 |
| 25% | 0.000000 |
| 50% | 0.000000 |
| 75% | 1.000000 |
| max | 4.000000 |
fig, ax = plt.subplots(figsize=(14,7))
plt.title(f'Histogram of DR severity levels (total of {train_df.count()[0]} cases)')
plt.grid(True)
sns.countplot(x='level', palette="ch:.25", data=train_df, ax=ax)
<matplotlib.axes._subplots.AxesSubplot at 0x7f7329232a50>
It seems that we posses many cases of healty eye images, but we don't posses alot of unhealthy specimens
Let's take a look at the images height and width dimensions distributions to make sure whether they are consistent in the dataset.
Create a simple iterator for the dataset, in order to iterate over all the images in the directory
Let's add two additional columns of width, height to our dataframe for each opened image
train_df['width'] = 0
train_df['height'] = 0
train_df.head(7)
| image | level | width | height | |
|---|---|---|---|---|
| 0 | 10_left | 0 | 0 | 0 |
| 1 | 10_right | 0 | 0 | 0 |
| 2 | 13_left | 0 | 0 | 0 |
| 3 | 13_right | 0 | 0 | 0 |
| 4 | 15_left | 1 | 0 | 0 |
| 5 | 15_right | 2 | 0 | 0 |
| 6 | 16_left | 4 | 0 | 0 |
from PIL import Image
def collect_dim_of_images(dir_path: str):
for i, img_path in enumerate(train_df['image']):
with Image.open(dir_path + '/' + img_path + '.jpeg') as img:
train_df.loc[i,'width'] = img.size[0] # width (in pixels)
train_df.loc[i,'height'] = img.size[1] # height (in pixels)
collect_dim_of_images(TRAIN_IMAGES_PATH)
train_df
| image | level | width | height | |
|---|---|---|---|---|
| 0 | 10_left | 0 | 1024 | 1024 |
| 1 | 10_right | 0 | 1024 | 1024 |
| 2 | 13_left | 0 | 1024 | 954 |
| 3 | 13_right | 0 | 1024 | 954 |
| 4 | 15_left | 1 | 1024 | 1023 |
| ... | ... | ... | ... | ... |
| 35103 | 44347_right | 0 | 1024 | 1024 |
| 35104 | 44348_left | 0 | 1024 | 840 |
| 35105 | 44348_right | 0 | 1024 | 841 |
| 35106 | 44349_left | 0 | 1024 | 799 |
| 35107 | 44349_right | 1 | 1024 | 806 |
35108 rows × 4 columns
Now let's plot the distributions of width and height
fig, axis = plt.subplots(1, 2, figsize=(14,7))
fig.suptitle('Distributions of Width and Height Dimensions of Images in Dataset')
sns.histplot(train_df['width'], ax=axis[0])
sns.histplot(train_df['height'], ax=axis[1])
<matplotlib.axes._subplots.AxesSubplot at 0x7fa4abcf0890>
We may see that the width is consistent but the height isn't.
We'll have to make sure to resize the height to a fixed value prior to attempting any neural network procedures. (Resize to 1024).
An additional concern of mine is that the images are quite large.
1024x1024 features/pixels may turn out to be computationaly "difficult" to some extent for our future model.
def resize(img, target_width: int, target_height: int):
return cv2.resize(img, (target_width, target_height))
Let's see how the resize function works on some images
# original image
fig, axis = plt.subplots(1, 2, figsize=(16,16))
img = cv2.imread(TRAIN_IMAGES_PATH + '/410_right.jpeg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # preserve the original color
resized = resize(img, 1024, 1024)
axis[0].imshow(img)
axis[1].imshow(resized)
axis[0].set_title(f'Original: {img.shape}')
axis[1].set_title(f'Resized: {resized.shape}')
Text(0.5, 1.0, 'Resized: (1024, 1024, 3)')
We are not interested in streching the image till it reaches the target size, as seen above.
The image aspect ratio is no longer kept.
Thus, We need to find a way to pad with black pixels.
def resize_and_pad(img, target_size: int):
"""
Accepts an image and resizes it to a (n x n) squared image with the given size
in pixels.
Preserves the aspect ratio, by padding with border color if needed.
"""
old_size = img.shape[:2] # old_size is in (height, width) format
if old_size[0] == target_size and old_size[1] == target_size:
return img
ratio = float(target_size) / max(old_size)
new_size = tuple([int(x * ratio) for x in old_size]) # new_size should be in (width, height) format
img = cv2.resize(img, (new_size[1], new_size[0]))
delta_w = target_size - new_size[1]
delta_h = target_size - new_size[0]
top, bottom = delta_h // 2, delta_h - (delta_h // 2)
left, right = delta_w // 2, delta_w - (delta_w // 2)
return cv2.copyMakeBorder(img,
top, bottom,
left,right,
cv2.BORDER_CONSTANT,
value=[0,0,0]) # color
# original image
fig, axis = plt.subplots(1, 2, figsize=(16,16))
img = cv2.imread(TRAIN_IMAGES_PATH + '/410_right.jpeg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # preserve the original color
resized = resize_and_pad(img, 1024)
axis[0].imshow(img)
axis[1].imshow(resized)
axis[0].set_title(f'Original: {img.shape}')
axis[1].set_title(f'Resized: {resized.shape}')
Text(0.5, 1.0, 'Resized: (1024, 1024, 3)')
Now that's the form of images we want to work with.
The aspect ratio is preserved and the additional padding needed was applied succesfully.
Let us now attempt to answer the question: What makes an eye image to be classified with DR severity level greater than 0 ?
We need to evaluate some of the pathological signs caused as a result of the disease.
I will manually pick three "good" images with the following levels: Healthy (0), Moderate (2), and Severe (4).
The difference between each pair of levels is equal on the level scale.
Let us try to distinguish between the three cases in order to get some ideas on what features correspond to each case, and hopefully get some expectations later on.
bold text## What are we looking for ?
Picture source: http://dpeh.in/diabetic-retinopathy-treatment/
fig, axis = plt.subplots(3, 1, figsize=(30,30))
for i, filename in enumerate(['/706_left.jpeg', '/698_right.jpeg', '/966_left.jpeg']):
img = cv2.imread(TRAIN_IMAGES_PATH + filename)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
axis[i].imshow(img)
Healthy sample overall. Maybe there is some negligible signs of DR, but it maybe the poor quality of the picture. Our model will have to deal with these poor conditions later on.
fig, axis = plt.subplots(3, 1, figsize=(30,30))
for i, filename in enumerate(['/129_right.jpeg', '/715_left.jpeg', '/829_left.jpeg']):
img = cv2.imread(TRAIN_IMAGES_PATH + filename)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
axis[i].imshow(img)
You may see some signs of DR:
fig, axis = plt.subplots(3, 1, figsize=(30,30))
for i, filename in enumerate(['/217_right.jpeg', '/936_right.jpeg', '/23403_left.jpeg']):
img = cv2.imread(TRAIN_IMAGES_PATH + filename)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
axis[i].imshow(img)
We may observe clearly that the eye suffers from several signs of DR:
Some images are darker than the others, some are brighter, some are cropped, and some have poor detail resolution.
We need to be able to discover features that are not profoundly affected by the above disturbances.
# 23403_left.jpeg level 4 severe case
img = cv2.imread(TRAIN_IMAGES_PATH + '/23403_left.jpeg')
gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
fig, axis = plt.subplots(1, 2, figsize=(22, 22))
axis[0].imshow(img)
axis[1].imshow(gray_img, cmap='gray')
axis[0].set_title('Original Image')
axis[1].set_title('Original Gray Image')
Text(0.5, 1.0, 'Original Gray Image')
This is an original level 4 image vs original gray scaled.
You can see at the bottom of the image that the eye has some cotton wool spots or hard exudates. The gray scale made a bit clearer.
Let's try to make it even more distinguishable.
def change_lightning(img, alpha: float, beta: int):
"""
alpha: [1.0-3.0] float - Simple contrast control
beta: [0-100] int - Simple brightness control
Do the operation new_image(i,j) = alpha*image(i,j) + beta
Instead of these 'for' loops we could have used simply:
new_image = cv.convertScaleAbs(image, alpha=alpha, beta=beta)
but we wanted to show you how to access the pixels :)
Notice: This function is very slow
"""
new_image = np.zeros(img.shape, img.dtype)
for y in range(img.shape[0]):
for x in range(img.shape[1]):
for c in range(img.shape[2]):
new_image[y, x, c] = np.clip(alpha * img[y, x, c] + beta, 0, 255)
return new_image
fig, axis = plt.subplots(1, 2, figsize=(22, 22))
new_img = change_lightning(img=img, alpha=1.2, beta=60) # this function takes way too long
new_img = cv2.cvtColor(new_img, cv2.COLOR_BGR2GRAY)
axis[0].imshow(gray_img, cmap='gray')
axis[1].imshow(new_img, cmap='gray')
axis[0].set_title('Original Gray Image')
axis[1].set_title('After Contrast and Brightness Change')
Text(0.5, 1.0, 'After Contrast and Brightness Change')
Overall improvement. But the function is not optimized and takes way too long for such an improvement.
Also consider that we have 35K images to process.
I also tried to tweak with the α and β parameters but did not get any significantly better results.
Let's search the internet for something alternative and better.
Use the cv2.addWeighted(source1, alpha, source2, beta, gamma[, dst[, dtype]])
It is a technique to blend two images togheter, where each image source has its corresponding weights.
The merged images are the source (original) image and its gaussian blur transform.
The blurring of an image means smoothening of an image i.e., removing outlier pixels that may be noise in the image.
This technique will hopefully improve some of the essential details for us.
The sigma parameter of GaussianBlur is the standard deviation value of kernal along horizontal direction.
fig, axis = plt.subplots(3, 3, figsize=(20, 18))
images_list = [['31014_left', '8144_right', '2874_left'], # level 0
['2880_left', '829_left', '40101_right'], # level 2
['217_right', '936_right', '23403_left']] # level 4
for i, sublist in enumerate(images_list):
for j, filename in enumerate(sublist):
img = cv2.imread(TRAIN_IMAGES_PATH + '/' + filename + '.jpeg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = resize_and_pad(img, 1024)
img = cv2.addWeighted(img, 4, cv2.GaussianBlur(img, ksize=(0, 0), sigmaX=10), -4, 128)
axis[i][j].imshow(img)
axis[i][j].set_title(f'level {i*2}: {filename}')
axis[i][j].set_xticks(())
axis[i][j].set_yticks(())
Ok, now that's way better results relative to previous attempt.
We can observe better the severity differences between different cases.
What's important is that the significant features of DR are enhanced.
Let's checkout how it look in gray scale:
fig, axis = plt.subplots(3, 3, figsize=(20, 18))
for i, sublist in enumerate(images_list):
for j, filename in enumerate(sublist):
img = cv2.imread(TRAIN_IMAGES_PATH + '/' + filename + '.jpeg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = resize_and_pad(img, 1024)
img = cv2.addWeighted(img, 4, cv2.GaussianBlur(img, ksize=(0, 0), sigmaX=10), -4, 128)
axis[i][j].imshow(img, cmap='gray')
axis[i][j].set_title(f'level {i*2}: {filename}')
axis[i][j].set_xticks(())
axis[i][j].set_yticks(())
Let's see how a larger sigma affects the images. Let's change sigma from 10 to 30:
fig, axis = plt.subplots(3, 3, figsize=(20, 18))
for i, sublist in enumerate(images_list):
for j, filename in enumerate(sublist):
img = cv2.imread(TRAIN_IMAGES_PATH + '/' + filename + '.jpeg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = resize_and_pad(img, 1024)
img = cv2.addWeighted(img, 4, cv2.GaussianBlur(img, ksize=(0, 0), sigmaX=30), -4, 128)
axis[i][j].imshow(img)
axis[i][j].set_title(f'level {i*2}: {filename}')
axis[i][j].set_xticks(())
axis[i][j].set_yticks(())
Some details are more emphasized, but on the other hand, some images suffer from a "white ring" near the edges of the circumference.
This means that their is a trade off here when increasing sigma parameter.
Another technique to to check out
fig, axis = plt.subplots(3, 3, figsize=(20, 18))
for i, sublist in enumerate(images_list):
for j, filename in enumerate(sublist):
img = cv2.imread(TRAIN_IMAGES_PATH + '/' + filename + '.jpeg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = resize_and_pad(img, 1024)
img = cv2.detailEnhance(img, sigma_s=10, sigma_r=0.15)
img = cv2.edgePreservingFilter(img, flags=1, sigma_s=64, sigma_r=0.2)
axis[i][j].imshow(img)
axis[i][j].set_title(f'level {i*2}: {filename}')
axis[i][j].set_xticks(())
axis[i][j].set_yticks(())
The results look satisfying in some images, but I think this technique is less better than the one above with the addWeighted function.
We're still in the EDA phase.
The next step of the project will be to feed in the "resized, black padded and light enhanced" images
into some classification algorithm of Sklearn to begin with as a baseline.
To make sure we don't create what's called a systematic bias towards the black padded images,-
let us check first whether the images that need to be padded are mostly "healthy" images, or "moderate", or other category.
IMAGE_SIZE = 1024 # for squared images: (IMAGE_SIZE x IMAGE_SIZE)
NUM_CLASSES = 5
def count_images_to_pad(image_names: pd.Series):
counter = 0
for name in image_names:
img = cv2.imread(TRAIN_IMAGES_PATH + '/' + name + '.jpeg')
if img.shape[0] != IMAGE_SIZE: # check height (the only dimension that varies in the images)
counter += 1
return counter
# Select rows meeting logical condition on column
names_0 = train_df.loc[train_df['level'] == 0]['image']
names_1 = train_df.loc[train_df['level'] == 1]['image']
names_2 = train_df.loc[train_df['level'] == 2]['image']
names_3 = train_df.loc[train_df['level'] == 3]['image']
names_4 = train_df.loc[train_df['level'] == 4]['image']
# count the images that need to be resized to meete our preferred dimensions: 1024 x 1024
different_size_counters = [count_images_to_pad(names) for names in [names_0, names_1, names_2, names_3, names_4]]
different_size_counters
[16830, 1646, 3267, 557, 475]
# count the images that already come with the preffered dimensions: 1024 x 1024
preferred_size_counters = [len(train_df.loc[train_df['level'] == x]) for x in range(NUM_CLASSES)]
preferred_size_counters = list(map(lambda x: x[0] - x[1], list(zip(preferred_size_counters, different_size_counters))))
preferred_size_counters
[8972, 792, 2021, 315, 233]
Let's plot the above counters in such a way, that we can understand whether there is
a systematic bias towards fixed 1024 sized images or towards different sized images.
x = np.arange(NUM_CLASSES) # the label locations
width = 0.35 # the width of the bars
fig, ax = plt.subplots(figsize=(10, 10))
rects1 = ax.bar(x + width/2, different_size_counters, width, label='Different Size')
rects2 = ax.bar(x - width/2, preferred_size_counters, width, label='Size 1024 x 1024')
# text in grouped bar chart
for bar in ax.patches:
value = bar.get_height()
text = f'{value}'
text_x = bar.get_x() + bar.get_width() / 2
text_y = bar.get_y() + value + 200
ax.text(text_x, text_y, text, ha='center', size=12)
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('NUmber of Images')
ax.set_title('Number of Preffered Size 1024 x 1024 Images vs Number of Different Size Images')
ax.set_xticks(x)
ax.set_xticklabels(['Healthy','Mild', 'Moderate', 'Proliferative', 'Severe'])
ax.legend()
fig.tight_layout()
The amount of images that are different than 1024 x 1024 (need to be padded) greatly outnumber, by almost x2
the number of images that already come in size 1024 x 1024 in each category class.
This means that - regardless of padding the images to a fixed size, we get a systematic bias towards black padded images.
Classic classification algorithms do not "handle well" raw images input.
(also images after some transormation such as:
padding or change in lighting conditions).
Therefore, we need to find a way to extract features so we can feed in these features instead of the raw (or transformed) images.
We will do this later, after we demonstrate how a "classic" classification algorithm fails with raw images.
def resize_all(dir_path: str, target_size: int, gray=False):
"""
Resize all images within the directory provided by dir_path.
All images will be resized to consistent size: (target_size x target_size)
The resulting images will replace the old ones.
"""
for filename in tqdm(os.listdir(dir_path)):
img = cv2.imread(dir_path + '/' + filename)
if gray:
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
else:
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
if not cv2.imwrite(dir_path + '/' + filename, resize_and_pad(img, IMAGE_SIZE)):
raise IOError('Error writing resized image')
resize_all(TRAIN_IMAGES_PATH, IMAGE_SIZE) # this takes a while
100%|██████████| 35108/35108 [19:32<00:00, 29.95it/s]
We may now drop the width, height columns from our dataframe. We won't use them.
train_df.drop(['width', 'height'], axis=1, inplace=True)
import sklearn
from sklearn.model_selection import train_test_split
The whole resized dataset cannot fit into memory,
so we'll have to program a batch generator that will provide batches of
of our dataset: images, labels.
Each generated batch must align with the requirements of sklearn modules.
The following class will be useful for us when implementing the training with the batch generator.
It provides a convinient way to access image and its label.
Notice that we apply a transformation on the image upon accessing it.
class DRImagesDataSet:
def __init__(self, path, df):
"""
path: path to dataset directory - images
df: pandas dataframe that represents image name -> label entries
"""
self.path = path
self.df = df
def __len__(self):
return len(self.df)
def __getitem__(self, name: str):
"""
find the image name in the dataframe to get the label,
then load the image and apply these transformations:
- convert to grayscale color
- change lightning conditions (presented in the cells above)
finally return the pair.
Notice: all images are assumed to be te same size of (IMAGE_SIZE x IMAGE_SIZE)
"""
label = self.df.loc[self.df.image == name, 'level'].values[0]
image = cv2.imread(TRAIN_IMAGES_PATH + '/' + name + '.jpeg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
image = cv2.addWeighted(image, 4, cv2.GaussianBlur(image, (0, 0), sigmaX=10), -4, 128)
return image, label
def train_validation_split(self, n, split_ratio, seed=None):
"""
Takes the number of elements/entries in the dataset (size of df usually)
and returns two lists of indices:
train list is about n * split_ration elements
validation list is about n * (1-split_ratio) elements
"""
assert 0 <= split_ratio and split_ratio <= 1
if seed is not None:
np.random.seed(seed)
l = list(range(n))
np.random.shuffle(l)
split_point = int(np.floor(len(l) * split_ratio))
train = l[: split_point]
valid = l[split_point: ]
print(f'train: {len(train)}, validation: {len(valid)}')
return train, valid
Integer divisors of 35108 are: 2, 4, 67, 131, 134, 262, 268, 524, 8777, 17554
We'll choose the batch size from one of these values.
BATCH_SIZE = 262
SLICE_SIZE = 10
NUM_CLASSES = 5
SPLIT_RATIO = 0.7
Some tools for batch processing in sklearn
from itertools import chain, islice
def train(classifier, dataset, train_names, batch_size=BATCH_SIZE):
"""
-> Read only part of the data
-> Partial train the classifier
-> delete the data
-> read other part of the data
-> continue to train the classifier.
classifier: the chosen classifier of our network. must support partial_fit method
dataset: generator of fixed size batches of the dataset
"""
iterator = image_vectors_1d_and_label(dataset, train_names)
for batch in gen_batches(iterator, BATCH_SIZE):
image_batch, label_batch = zip(*batch)
label_batch = list(label_batch)
classifier.partial_fit(image_batch, pd.Series(label_batch), classes=[0 ,1, 2, 3, 4])
print('.', end='')
return classifier
def image_vectors_1d_and_label(dataset, filenames):
"""
gnerator that converts 2D images to 1D vectors and yields them with their
appropriate label
"""
for name in filenames:
image, label = dataset[name]
yield image.flatten(), label
def gen_batches(iterable, size=SLICE_SIZE):
iterator = iter(iterable)
for first in iterator:
yield chain([first], islice(iterator, size - 1))
Define our resized images dataset for sklearn:
dataset = DRImagesDataSet(TRAIN_IMAGES_PATH, train_df)
Perform a split to get training indices and validtion indices:
train_indices, validation_indices = dataset.train_validation_split(n=len(dataset),
split_ratio=SPLIT_RATIO,
seed=42)
train: 24575, validation: 10533
train_df.iloc[train_indices, :].values
array([['17340_right', 0],
['27852_right', 0],
['34901_left', 4],
...,
['4184_left', 0],
['24305_right', 0],
['19876_right', 0]], dtype=object)
Prepare the train_names for the train function, and validation_names for accuracy estimation later on.
train_names = train_df.iloc[train_indices, :].values[:,0]
train_names = train_names.astype(str)
validation_names = train_df.iloc[validation_indices, :].values[:,0]
validation_names = validation_names.astype(str)
from sklearn.naive_bayes import MultinomialNB
I chose Multinomial Naive Bayes classifier according to "Choosing the right estimator" guideline in skleran website:
https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
classifier = MultinomialNB()
classifier = train(classifier, dataset, train_names, batch_size=BATCH_SIZE)
..............................................................................................
from sklearn.metrics import cohen_kappa_score, precision_score, confusion_matrix, classification_report
def get_true_labels_and_predicted_labels(classifier, batch_iter, batch_size):
"""
Let the model predict the images.
Return the true labels list and their corresponding prediceted labels
"""
predictions = []
true_labels = []
for batch in gen_batches(batch_iter, batch_size):
image_batch, label_batch = zip(*batch)
prediction = classifier.predict(image_batch)
predictions = np.concatenate([predictions, prediction])
true_labels = np.concatenate([true_labels, label_batch])
print('.', end='')
return true_labels, predictions
def draw_confusion_matrix(correct_labels, predictions, num_categories, title):
cf_matrix = confusion_matrix(correct_labels, predictions, labels=range(num_categories))
df_cm = pd.DataFrame(cf_matrix, range(num_categories), range(num_categories))
plt.figure(figsize=(12,9))
ax = sns.heatmap(cf_matrix, annot=True, cmap='Blues')
ax.set_title(title)
ax.set_xlabel('\nPredicted DR Category')
ax.set_ylabel('Actual DR Category\n')
ax.xaxis.set_ticklabels(['Healthy','Mild', 'Moderate', 'Proliferative', 'Severe'])
ax.yaxis.set_ticklabels(['Healthy','Mild', 'Moderate', 'Proliferative', 'Severe'])
plt.show()
true_labels, predicted_labels = get_true_labels_and_predicted_labels(classifier,
image_vectors_1d_and_label(dataset,
validation_names),
BATCH_SIZE)
.........................................
precision = precision_score(true_labels, predicted_labels, average='macro')
print(f'The "precision score" of our prediction: {precision}')
The "precision score" of our prediction: 0.21307213159538888
print(classification_report(true_labels, predicted_labels, target_names=['Healthy', 'Mild', 'Moderate', 'Proliferative', 'Severe']))
draw_confusion_matrix(true_labels, predicted_labels, NUM_CLASSES, 'Naive Bayes Confusion Matrix of Diabetic Retinopathy Images\n')
precision recall f1-score support
Healthy 0.76 0.15 0.25 7727
Mild 0.08 0.21 0.11 713
Moderate 0.17 0.19 0.18 1605
Proliferative 0.03 0.34 0.06 262
Severe 0.03 0.33 0.05 226
accuracy 0.17 10533
macro avg 0.21 0.24 0.13 10533
weighted avg 0.59 0.17 0.22 10533
validation_df = train_df[np.isin(train_df, [validation_names]).any(axis=1)]
print('Validation set stats:')
print('Healthy: ' + str(validation_df['level'].value_counts()[0]))
print('Mild: ' + str(validation_df['level'].value_counts()[1]))
print('Moderate: ' + str(validation_df['level'].value_counts()[2]))
print('Proliferative: ' + str(validation_df['level'].value_counts()[3]))
print('Severe: ' + str(validation_df['level'].value_counts()[4]))
print('Total: ' + str(len(validation_df)))
Validation set stats: Healthy: 7727 Mild: 713 Moderate: 1605 Proliferative: 262 Severe: 226 Total: 10533
We can see from the report above that out of roughly 7700 healthy samples, our model prediced correcty only about 1200.
The overall performance of the model is not good at all.
qwk_score = cohen_kappa_score(true_labels, predicted_labels, weights="quadratic")
print(f'Quadratic Weighted Kappa test score: {qwk_score}')
Quadratic Weighted Kappa test score: 0.0223680222063104
Kappa or Cohen’s Kappa is like classification accuracy, except that it is normalized at the baseline of random chance on your dataset: It basically tells you how much better your classifier is performing over the performance of a classifier that simply guesses at random according to the frequency of each class.
Quadratic Weighted Kappa test score: 0.0233 indicates that our naive bayes model performs as if it guessed poorly.
The precision score of 0.213 does not look promising either.
This was a demonstration of a classical algorithm, failing to categorize,
when given transformed images with black padding bias.
As we mentioned erlier, before demonstrating the failure of above, we want to be able to rely on features extracted from the images
independatly of the image's dimensions and padding situation.
In this EDA pahse, we will try and acheive that with "Histogram of Oriented Gradients" (HOG) technique.
In the HOG feature descriptor, the distribution ( histograms ) of directions of gradients
( oriented gradients ) are used as features.
Gradients ( x and y derivatives ) of an image are useful because the magnitude
of gradients is large around edges and corners ( regions of abrupt intensity changes )
and we know that edges and corners pack in a lot more information about object shape than flat regions.
HOG descriptors may be used for object recognition by providing them as features to a machine learning algorithm.
img = cv2.imread(TRAIN_IMAGES_PATH + '/40101_right.jpeg')
print(img.shape)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = resize_and_pad(img, IMAGE_SIZE) # 1024
print(img.shape)
(793, 1024, 3) (1024, 1024, 3)
def normalize(img):
return cv2.normalize(img, None, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
norm_img = normalize(img)
# Calculate gradients
gx = cv2.Sobel(norm_img, cv2.CV_32F, 1, 0, ksize=1)
gy = cv2.Sobel(norm_img, cv2.CV_32F, 0, 1, ksize=1)
# Calculate gradient magnitude and direction ( in degrees )
mag, angle = cv2.cartToPolar(gx, gy, angleInDegrees=True)
fig, axis = plt.subplots(2, 2, figsize=(20,20))
# multiply to enhance colors of plot
axis[0][0].imshow(norm_img);
axis[0][1].imshow(gx*40);
axis[1][0].imshow(gy*40);
axis[1][1].imshow(mag*20);
axis[0][0].set_title('Original - Level 2: 40101_right');
axis[0][1].set_title('X Gradient - Level 2: 40101_right');
axis[1][0].set_title('Y Gradient - Level 2: 40101_right');
axis[1][1].set_title('Gradient Magnitude - Level 2: 40101_right');
Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers). Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
from skimage.feature import hog
from skimage import exposure
Let's focus the HOG on a small patch of the image where we can clearly notice a "spot" (white spot).
Also check if the contrast enhancement does any difference:
img_1 = cv2.imread(TRAIN_IMAGES_PATH + '/40101_right.jpeg')
img_1 = cv2.cvtColor(img_1, cv2.COLOR_BGR2RGB)
img_2 = cv2.addWeighted(img_1, 4, cv2.GaussianBlur(img_1, ksize=(0, 0), sigmaX=25), -4, 128)
fig, axis = plt.subplots(2, 2, figsize=(16,10))
axis[0][0].imshow(img_1);
axis[0][1].imshow(img_1[300:500, 400:800]);
axis[1][0].imshow(img_2);
axis[1][1].imshow(img_2[300:500, 400:800]);
axis[0][0].set_title('40101_right');
axis[0][1].set_title('White Spot');
axis[1][0].set_title('40101_right contrast enhancement');
axis[1][1].set_title('White Spot contrast enhancement');
# define the patch of the image and resize
crop_1 = img_1[300:500, 400:800] # crop y:y+h, x:x+w (preserve asspect ratio 1:2 as mentioned in the links above)
crop_1 = cv2.resize(crop_1, (128, 64))
crop_2 = img_2[300:500, 400:800]
crop_2 = cv2.resize(crop_2, (128, 64))
fig, axis = plt.subplots(1, 2, figsize=(16,8))
axis[0].imshow(crop_1);
axis[1].imshow(crop_2);
axis[0].set_title('Resized from 200x400 to 64x128');
Explanation of the parametes of HOG function:
# creating hog features
fd, hog_img = hog(img, # the original (normalized) image
orientations=9, # Number of bins in the histogram we want to create [0,20,40,60,80,100,120,140,160]
pixels_per_cell=(8, 8), # Determines the size of the cell
cells_per_block=(2, 2), # Number of cells per block
visualize=True, # A boolean whether to return the image of the HOG
multichannel=True) # We set it to True to tell the function that the last dimension
# is considered as a color channel, instead of spatial
# Returned HOG features descriptor (fd) for the image (1D vector) and a visualisation of the HOG imag
_, hog_img_1 = hog(crop_1, orientations=9, pixels_per_cell=(4, 4), cells_per_block=(1, 1), visualize=True, multichannel=True)
_, hog_img_2 = hog(crop_2, orientations=9, pixels_per_cell=(4, 4), cells_per_block=(1, 1), visualize=True, multichannel=True)
Every specified patch pixel will represent a star, which is a combination of gradient magnitude and angle.
The parameters that yielded the "best" hog image where the main spot is seen clearly by the surrounding stars, and the hard exudates at the right bottom corner are also seen to us.
We got this after some tweaking with the sigmaX parameter of the GaussianBlur function and hog parameters.
fig, axis = plt.subplots(1, 2, figsize=(20,14))
im0 = axis[0].imshow(hog_img_1, cmap='Blues');
im1 = axis[1].imshow(hog_img_2, cmap='Blues');
ax0 = axis[0]
ax1 = axis[1]
plt.colorbar(im0, ax=ax0, orientation='horizontal');
plt.colorbar(im1, ax=ax1, orientation='horizontal');
ax0.set_title('HOG image of original crop');
ax1.set_title('HOG image after contrast enhancement');
The feature values are larger in the right plot than in the left.
img = cv2.imread(TRAIN_IMAGES_PATH + '/40101_right.jpeg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.addWeighted(img, 4, cv2.GaussianBlur(img, ksize=(0, 0), sigmaX=25), -4, 128)
print(f"width: {img.shape[1]} height: {img.shape[0]}")
width: 1024 height: 793
Define the patch size. Resize to 1:2 aspect ratio
img = cv2.resize(img, (128 * 8, 64 * 8)) # target width, target height
print(f"width: {img.shape[1]} height: {img.shape[0]}")
width: 1024 height: 512
fig, axis = plt.subplots(1, 1, figsize=(12,12))
axis.imshow(img);
fd, hog_img = hog(img, orientations=9, pixels_per_cell=(8, 8), cells_per_block=(2, 2), visualize=True, multichannel=True)
# rescale the plot to amplify the colors of the features for us to see better
hog_image_rescaled = exposure.rescale_intensity(hog_img, in_range=(0, 10))
fig, axis = plt.subplots(1, 1, figsize=(20, 12));
im = axis.imshow(hog_image_rescaled, cmap='Blues');
axis.axis('off');
plt.colorbar(im, ax=axis, orientation='horizontal', shrink=0.5);
axis.set_title('Rescaled Intensity Histogram of Oriented Gradients - 40101_right Level 2');
The "circles" are now more noticed
We get HOG features vector of size:
fd.size
288036
For convinience we want to be able to execute sklearn's SVC algorithm on the HOG features.
Since sklearn does not support partial_fit method for SVC we cannot perform batch process as we did before,
and cannot input the entire train dataset with features of size 288036 because we will run out of available memory and it will take way too long.
Hence, we'll attempt to customize the HOG function parameters in such a way that
we get "reasonable" features for the price of reduced features vector size.
The goal is to reduce the features size to an order of magnitude in the thousands.
After alot of experimentation, this is the acheived result:
img = cv2.imread(TRAIN_IMAGES_PATH + '/40101_right.jpeg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.addWeighted(img, 4, cv2.GaussianBlur(img, ksize=(0, 0), sigmaX=25), -4, 128)
img = cv2.resize(img, (128 * 4, 64 * 4)) # target width, target height
print(img.shape)
fig, axis = plt.subplots(1, 1, figsize=(8,8))
axis.imshow(img);
(256, 512, 3)
Crop to get rid of the background (to avoid any bias)
crop = img[25:225, 50:450] # crop y:y+h, x:x+w, preserve 1:2
fig, axis = plt.subplots(1, 1, figsize=(8,8))
axis.imshow(crop);
crop.shape
(200, 400, 3)
fd, hog_img = hog(crop,
orientations=9,
pixels_per_cell=(10,10),
cells_per_block=(1, 1),
visualize=True, multichannel=True)
fd.size
7200
hog_image_rescaled = exposure.rescale_intensity(hog_img, in_range=(0, 10))
fig, axis = plt.subplots(2, 1, figsize=(14,14))
axis[0].imshow(crop);
axis[1].imshow(hog_image_rescaled, cmap='Blues');
axis[0].set_title('Original image with contrast enhancement - 40101_right Level 2');
axis[1].set_title('Histogram of Oriented Gradients - 40101_right Level 2');
axis[0].axis('off');
axis[1].axis('off');
We can infer some DR related features from this HOG image, less than the previous, but in return we get a features vector size of: 7200
Here is a plot of the raw features vector:
fig, axis = plt.subplots(1, 1, figsize=(10, 8))
axis.set_title('Raw Features - 40101_right Level 2');
axis.plot(fd);
Display some HOG features of Level 0 DR
fig, axis = plt.subplots(2, 3, figsize=(20,6))
for j,name in enumerate(['31014_left', '8144_right', '1192_right']): # level 0
img = cv2.imread(TRAIN_IMAGES_PATH + '/' + name + '.jpeg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.addWeighted(img, 4, cv2.GaussianBlur(img, ksize=(0, 0), sigmaX=25), -4, 128)
img = cv2.resize(img, (128 * 8, 64 * 8))
_, hog_img = hog(img, orientations=9, pixels_per_cell=(16, 16), cells_per_block=(1, 1), visualize=True, multichannel=True)
hog_rescaled = exposure.rescale_intensity(hog_img, in_range=(0, 10))
axis[0][j].imshow(img)
axis[1][j].imshow(hog_rescaled, cmap='Blues')
axis[0][j].set_title(f'level 0: {name}')
axis[0][j].axis('off')
axis[1][j].axis('off')
Display some HOG features of Level 3 DR
fig, axis = plt.subplots(2, 3, figsize=(20,6))
for j,name in enumerate(['163_left', '328_right', '1196_right']): # level 3
img = cv2.imread(TRAIN_IMAGES_PATH + '/' + name + '.jpeg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.addWeighted(img, 4, cv2.GaussianBlur(img, ksize=(0, 0), sigmaX=25), -4, 128)
img = cv2.resize(img, (128 * 8, 64 * 8))
_, hog_img = hog(img, orientations=9, pixels_per_cell=(16, 16), cells_per_block=(1, 1), visualize=True, multichannel=True)
hog_rescaled = exposure.rescale_intensity(hog_img, in_range=(0, 10))
axis[0][j].imshow(img)
axis[1][j].imshow(hog_rescaled, cmap='Blues')
axis[0][j].set_title(f'level 3: {name}')
axis[0][j].axis('off')
axis[1][j].axis('off')
After exploring and understanding the data we are working with, it is time to build a model that will be able to categorize each image to its correct label.
First we'll try out some of Sklearn's classic machine learning algorithms.
These algorithms will get the HOG features vector as input instead of the raw images.
Later on we would attempt to build a Convolutional Neural Network that will attempt to do the same.
This CNN will be based of a known pre-built (maybe even pre-trained) model.
We only add few external layers to it in order to adjust it to our categorization needs.
This method of model building is called "Transfer Learning".
The common goal for all the methods above is to learn the profound features that represent each category.
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.metrics import classification_report, confusion_matrix
Define our extractor class that responsible for loading an image and calculating its HOG vector.
# constants for image resize before hog processing
WIDTH_HOG = 128 * 4 # 512
HEIGHT_HOG = 64 * 4 # 256
class FeatureExtractor:
def __init__(self,
color_space=cv2.COLOR_BGR2RGB,
orientation=9,
pixels_per_cell=(10, 10),
cells_per_block=(1, 1),
multichannel=True):
self.color_space = color_space
self.orientation = orientation
self.pixels_per_cell = pixels_per_cell
self.cells_per_block = cells_per_block
self.multichannel = multichannel
def get_hog_features(self, img_name: str, dtype=None):
img = cv2.imread(TRAIN_IMAGES_PATH + '/' + img_name + '.jpeg')
img = cv2.cvtColor(img, self.color_space)
img = cv2.addWeighted(img, 4, cv2.GaussianBlur(img, ksize=(0, 0), sigmaX=25), -4, 128)
img = cv2.resize(img, (WIDTH_HOG, HEIGHT_HOG))
crop = img[25:225, 50:450] # 200 x 400
fd = hog(crop,
orientations=self.orientation,
pixels_per_cell=self.pixels_per_cell,
cells_per_block=self.cells_per_block,
visualize=False,
multichannel=self.multichannel) # default type: np.float64
if dtype is not None:
return fd.astype(dtype)
return fd
def extract_features(filenames, extractor):
"""
Read in a list of image files and extract HOG features for training a model
Say we have matrix X where each row is a sample/observation and each
column is a variable/feature.
This is the expected input for any sklearn ML function.
X.shape should be [number_of_samples, number_of_features].
"""
print('Creating HOG features matrix')
features_matrix = extractor.get_hog_features(filenames[0], dtype=np.float32)
for name in tqdm(filenames[1:]):
features_matrix = np.vstack((features_matrix,
extractor.get_hog_features(name, dtype=np.float32)))
return features_matrix
extractor = FeatureExtractor()
Extract DR features from all images.
The expected number of features per sample is 7200
X = extract_features(train_df['image'].tolist(), extractor)
Creating HOG features matrix
100%|██████████| 35107/35107 [4:00:56<00:00, 2.43it/s]
# np.save('/content/drive/MyDrive/X_hog_9_10x10_1x1_7200.npy', X)
X = np.load('/content/drive/MyDrive/X_hog_9_10x10_1x1_7200.npy')
X.min(), X.max(), X.shape
(0.0, 1.0, (35108, 7200))
# normalize/standardize the HOG features matrix
X_scaler = StandardScaler().fit(X)
scaled_X = X_scaler.transform(X)
scaled_X.min(), scaled_X.max(), scaled_X.shape
(-8.261503, 14.500915, (35108, 7200))
Split the HOG features matrix to train/test groups:
X_train, X_test, y_train, y_test = train_test_split(scaled_X, # samples
train_df['level'].to_numpy(), # labels
test_size=0.2,
random_state=42) # random_state=np.random.randint(0, 100)
X_train.shape, X_test.shape
((28086, 7200), (7022, 7200))
from sklearn.svm import SVC
model_svc_rbf = SVC(kernel='rbf',
probability=True,
C=1,
gamma='scale',
cache_size=1000,
max_iter=500)
Train the model
print('Fitting model...')
model_svc_rbf.fit(X_train, y_train)
print('Done')
Fitting model... Done
Return the mean accuracy on the given test data and labels.
print(model_svc_rbf.score(X_test, y_test))
0.5085445741953859
Display the confusion matrix
def get_predicted_test_labels(classifier, test_set):
"""
Let the model predict the images.
Return the true labels list and their corresponding prediceted labels
"""
predictions = []
for fd_vec in tqdm(test_set):
prediction = classifier.predict(fd_vec.reshape(1, -1)) # fd_vec is a single sample
predictions = np.concatenate([predictions, prediction])
return predictions
predicted_labels = get_predicted_test_labels(model_svc_rbf, X_test)
draw_confusion_matrix(y_test, predicted_labels, NUM_CLASSES, 'SVC with RBF kernel - Confusion Matrix of Diabetic Retinopathy Images\n')
100%|██████████| 7022/7022 [06:00<00:00, 19.48it/s]
It seems that the model just "guesses" with probability if 75% if the a given healthy sample is indeed health.
It also confuses with healthy and moderate.
That's a progress from the previous attempt with MultinomialNB model on the raw images.
print(classification_report(y_test, predicted_labels, target_names=['Healthy', 'Mild', 'Moderate', 'Proliferative', 'Severe']))
precision recall f1-score support
Healthy 0.75 0.62 0.68 5212
Mild 0.08 0.13 0.10 481
Moderate 0.15 0.26 0.19 1023
Proliferative 0.06 0.06 0.06 159
Severe 0.25 0.05 0.08 147
accuracy 0.51 7022
macro avg 0.26 0.22 0.22 7022
weighted avg 0.59 0.51 0.54 7022
The report above reflects the RBF kernel model. It does somewhat decent job in predicting healthy samples with 0.75 accuracy.
from sklearn.linear_model import SGDClassifier
model_sgd = SGDClassifier(loss='hinge',
penalty='l2',
shuffle=False,
max_iter=1000) # log loss is not supported
print('Fitting model...')
model_sgd.fit(X_train, y_train)
print('Done')
print(model_sgd.score(X_test, y_test))
0.6335801765878667
0.63 score which is slightly better than SVC-RBF's 0.5
predicted_labels = get_predicted_test_labels(model_sgd, X_test)
draw_confusion_matrix(y_test, predicted_labels, NUM_CLASSES, 'SGD Clasiffier - Confusion Matrix of Diabetic Retinopathy Images\n')
100%|██████████| 7022/7022 [00:00<00:00, 8401.30it/s]
print(classification_report(y_test, predicted_labels, target_names=['Healthy', 'Mild', 'Moderate', 'Proliferative', 'Severe']))
precision recall f1-score support
Healthy 0.74 0.82 0.78 5212
Mild 0.09 0.02 0.04 481
Moderate 0.15 0.16 0.16 1023
Proliferative 0.07 0.01 0.01 159
Severe 0.06 0.02 0.03 147
accuracy 0.63 7022
macro avg 0.22 0.21 0.20 7022
weighted avg 0.58 0.63 0.61 7022
It does quite decent job in guessing the healthy samples, but overall poor job of guessing the other categories.
model_multiNB = MultinomialNB()
# without a scaler as required for fit
X_train_org, X_test_org, y_train_org, y_test_org = train_test_split(X,
train_df['level'].to_numpy(),
test_size=0.2,
random_state=42)
print('Fitting model...')
model_multiNB.fit(X_train_org, y_train_org)
print('Done')
Fitting model... Done
model_multiNB.score(X_test_org, y_test_org)
0.6918256906864141
predicted_labels = get_predicted_test_labels(model_multiNB, X_test_org)
draw_confusion_matrix(y_test_org, predicted_labels, NUM_CLASSES, 'Naive Bayes Clasiffier HOG - Confusion Matrix of Diabetic Retinopathy Images\n')
print(classification_report(y_test, predicted_labels, target_names=['Healthy', 'Mild', 'Moderate', 'Proliferative', 'Severe'], zero_division=1))
100%|██████████| 7022/7022 [00:01<00:00, 5093.57it/s]
precision recall f1-score support
Healthy 0.75 0.92 0.83 5212
Mild 0.13 0.03 0.04 481
Moderate 1.00 0.00 0.00 1023
Proliferative 1.00 0.00 0.00 159
Severe 0.06 0.18 0.09 147
accuracy 0.69 7022
macro avg 0.59 0.23 0.19 7022
weighted avg 0.73 0.69 0.62 7022
The matrix does not seem very promising even though the score we got is 0.69. It predited most of the samples as healthy.
So far SGD classifier provided the best model.
Now let us try a different approach to the problem. Convolutional Neural Networks.
Let's try a more modern approach, using a CNN based techniques with keras and TensorFlow.
from IPython.display import Image
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.callbacks import Callback, EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
from tensorflow.keras.utils import Sequence
from tensorflow.keras.utils import plot_model
import tensorflow.keras.layers as L
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_crossentropy
from tensorflow.keras import regularizers
from sklearn.metrics import classification_report, confusion_matrix, f1_score, ConfusionMatrixDisplay, cohen_kappa_score
print(keras.__version__, tf.__version__)
2.8.0 2.8.2
!pip install -q shap
import shap
|████████████████████████████████| 569 kB 5.2 MB/s eta 0:00:01
# download the DataFrame from my google drive
! gdown -O /content/images_dataframe.csv 1jRA_Qt5nJUSah0oYG4oKD7WBQ-Kv3yL7
Downloading... From: https://drive.google.com/uc?id=1jRA_Qt5nJUSah0oYG4oKD7WBQ-Kv3yL7 To: /content/images_dataframe.csv 100% 465k/465k [00:00<00:00, 111MB/s]
Shuffle the original images-labels DataFrame:
# load the DataFrame and shuffle
df = pd.read_csv('/content/images_dataframe.csv')
df = df.sample(frac=1).reset_index(drop=True)
print('Healthy: ' + str(df['level'].value_counts()[0]))
print('Mild: ' + str(df['level'].value_counts()[1]))
print('Moderate: ' + str(df['level'].value_counts()[2]))
print('Proliferative: ' + str(df['level'].value_counts()[3]))
print('Severe: ' + str(df['level'].value_counts()[4]))
Healthy: 25802 Mild: 2438 Moderate: 5288 Proliferative: 872 Severe: 708
An imbalanced dataset is a dataset where each category class is represented by a different number of input samples.
Decrease the number of the samples of the biggest class down to size of the smallest class.
(Since we cannot generate synthetic data to increase the number of severe/proliferative casses).
In the following code we'll show how to adjust the number of samples in each category according to our desire.
def sampling_k_elements(group, k=200):
"""
This method selects randomly k elements of each class,
and groups them into a dataframe.
"""
if len(group) < k:
return group
return group.sample(k)
# group by category
g0 = df.groupby('level').get_group(0)
g1 = df.groupby('level').get_group(1)
g2 = df.groupby('level').get_group(2)
g3 = df.groupby('level').get_group(3)
g4 = df.groupby('level').get_group(4)
# sample k number of random samples of each category
balanced = pd.concat([sampling_k_elements(g0, k=2000).reset_index(drop=True),
sampling_k_elements(g1, k=1400).reset_index(drop=True),
sampling_k_elements(g2, k=1400).reset_index(drop=True),
sampling_k_elements(g3, k=872).reset_index(drop=True),
sampling_k_elements(g4, k=708).reset_index(drop=True)],
ignore_index=True)
balanced = balanced.sample(frac=1).reset_index(drop=True) # shuffle the dataframe
print('Healthy: ' + str(balanced['level'].value_counts()[0]))
print('Mild: ' + str(balanced['level'].value_counts()[1]))
print('Moderate: ' + str(balanced['level'].value_counts()[2]))
print('Proliferative: ' + str(balanced['level'].value_counts()[3]))
print('Severe: ' + str(balanced['level'].value_counts()[4]))
print('Total: ' + str(len(balanced)))
Healthy: 2000 Mild: 1400 Moderate: 1400 Proliferative: 872 Severe: 708 Total: 6380
Because the overall balanced dataset is almost 6 times smaller than the original due to above, we'll keep it for later use.
Our first CNN attempt will be on the original dataset.
Split the original DataFrame into two DataFrames: Train and Validation:
VALIDATION_SPLIT = 0.2
split_index = int(len(df) * (1 - VALIDATION_SPLIT))
df_train = df.iloc[: split_index]
df_valid = df.iloc[split_index: ]
print('Train dataset length: ', len(df_train))
print('Valid dataset length: ', len(df_valid))
Train dataset length: 28086 Valid dataset length: 7022
def preprocess_images(src_dir, dst_dir):
"""
Prepare all images for learning tasks by modifying their contrast and size.
The resulting images will be placed in a new directory.
"""
print(f'Modifying existing images in: {src_dir} and writing them to: {dst_dir}')
for name in tqdm(os.listdir(src_dir)):
img = cv2.imread(src_dir + '/' + name)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = cv2.addWeighted(img, 4, cv2.GaussianBlur(img, ksize=(0, 0), sigmaX=25), -4, 128)
img = cv2.resize(img, (WIDTH, HEIGHT))
if not cv2.imwrite(dst_dir + '/' + name, img):
raise IOError('Error writing modified image')
# create a new directory for the new images
! mkdir /content/better_images
! chmod 666 /content/better_images
preprocess_images(TRAIN_IMAGES_PATH, '/content/better_images')
# zip the result images and upload to my google drive
! zip -q /content/better_images.zip /content/better_images/*
# save the new images for later session
! cp /content/better_images.zip /content/drive/MyDrive/better_images.zip
Modifying existing images in: /content/resized_train_cropped/resized_train_cropped and writing them to: /content/better_images
100%|██████████| 35108/35108 [2:04:32<00:00, 4.70it/s]
# download the better images from google drive to local /content directory
! gdown -O /content/better_images.zip 1A9j_o3K94j6mwhG2inUco6TCI5vzFIlW
Downloading... From: https://drive.google.com/uc?id=1A9j_o3K94j6mwhG2inUco6TCI5vzFIlW To: /content/better_images.zip 100% 4.23G/4.23G [00:17<00:00, 243MB/s]
# unzip images
! mkdir /content/better_images
! chmod 666 /content/better_images
! unzip -q /content/better_images.zip -d /content/better_images
# in mega bytes:
! du /content/better_images/content/better_images
4208668 /content/better_images/content/better_images
BETTER_IMAGES_PATH = '/content/better_images/content/better_images'
print(len(os.listdir(BETTER_IMAGES_PATH)))
35108
Let us start with a simple "light weight" convolutional neural network called: ResNet50.
This architecture can be used on computer vision tasks such as image classififcation, object localisation, object detection.
We would make our first attempt with this architecture as a base network.
Create our own customized image tensor generators for training and validation.
It's good practice to use a validation split when developing the model. 80% of the images for training and 20% for validation.
class CustomDataGen(Sequence):
def __init__(self,
df,
batch_size,
path,
norm=True,
contrast=False,
resize=False,
height=512, width=512,
onehot=False):
self.data_path = path
self.df = df.copy()
self.batch_size = batch_size
self.n_classes = NUM_CLASSES
self.n = len(self.df)
self.width = width
self.height = height
self.cur_index = 0
self.resize = resize
self.contrast = contrast
self.norm = norm
self.onehot = onehot
self.class_names = CLASS_NAMES
self.n_batches = int(np.math.ceil(self.n / self.batch_size))
def on_epoch_end(self):
"""
This function will be called at the end of every epoch by the fit method.
Thus any modification to the dataset between epochs
may be implemented in this function.
"""
pass
def __getitem__(self, index):
"""
The role of __getitem__ method is to generate one batch of data at
position index in the Sequence.
In this case, one batch of data will be (X, y) value pair where X
represents the input and y represents the output.
X will be a NumPy array of shape [batch_size, input_height, input_width, input_channel].
y will be a tuple with two NumPy arrays of shape [batch_size, n_name] and [batch_size, n_type]).
"""
self.cur_index = index
if index == self.n_batches - 1:
# reached last batch that may have less than batch_size items,
# so we need to handle that case
batch = self.df[index * self.batch_size: ]
else:
batch = self.df[index * self.batch_size: (index + 1) * self.batch_size]
X, y = self.__get_data(batch)
return X, y
def __get_data(self, batch):
"""
This function will take a sub dataframe, batch,
then iterate over the batch and call helper methods.
Aggregate the returned images, and returns a tuple (X, y) accordingly
with the correct labels.
"""
batch_paths = batch['image'].apply(lambda path: self.data_path + '/' + path + '.jpeg')
batch_labels = batch['level']
X_batch = np.asarray([self.__get_input(p) for p in batch_paths.values])
if self.onehot:
y_batch = np.asarray([self.__get_output(y, self.n_classes) for y in batch_labels])
else:
y_batch = np.asarray([y for y in batch_labels])
return X_batch, y_batch
def __get_input(self, path):
"""
Helper method to read raw images from disk.
If the images were preprocessed erlier then comment the
addWeighted and resize functions, and only apply normalization.
"""
img = cv2.imread(path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
if self.contrast:
img = cv2.addWeighted(img, 4, cv2.GaussianBlur(img, ksize=(0, 0), sigmaX=25), -4, 128)
if self.norm:
img = cv2.normalize(img, None, alpha=0, beta=1, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
if self.resize:
img = cv2.resize(img, (self.width, self.height))
return img
def __get_output(self, label, num_classes):
"""
This function takes the value and number of classes and
returns a one-hot encoded NumPy array of shape [num_classes,]
"""
return tf.keras.utils.to_categorical(label, num_classes=num_classes, dtype='int')
def __len__(self) -> int:
"""
number of batches the generator can produce.
"""
return self.n_batches
def print_batch(self, index):
"""
prints current sub-frame
"""
if index == self.n_batches:
print(self.df[index * self.batch_size: ])
else:
print(self.df[index * self.batch_size: (index + 1) * self.batch_size])
Some model parameters:
BATCH_SIZE = 8 # to not crash GPU
HEIGHT = 512
WIDTH = 512
CHANNELS = 3
NUM_CLASSES = 5
VALIDATION_SPLIT = 0.2
IMAGE_COUNT = 35108
CLASS_NAMES = ['Healthy', 'Mild', 'Moderate', 'Proliferative', 'Severe']
BETTER_IMAGES_PATH = '/content/better_images/content/better_images'
train_ds = CustomDataGen(df_train, BATCH_SIZE, BETTER_IMAGES_PATH, norm=True, onehot=True)
valid_ds = CustomDataGen(df_valid, BATCH_SIZE, BETTER_IMAGES_PATH, norm=True, onehot=True)
Check that the dimensions that the generator yields are correct
for image_batch, labels_batch in train_ds:
print(image_batch.shape)
print(labels_batch.shape)
break
(8, 512, 512, 3) (8, 5)
Now let's construct our model around the prepraired ResNet50 model.
This model should be light enough to not crash our session with Colab's GPU.
According to ResNet50 documentation, we need to perform some additional preprocessing on the input:
https://www.tensorflow.org/api_docs/python/tf/keras/applications/resnet50/ResNet50
Thus, we add the tf.keras.applications.resnet50.preprocess_input layer to the model:
tf.keras.backend.clear_session()
# input layer
input_layer = keras.Input(shape=(HEIGHT, WIDTH, CHANNELS))
# the required preprocessing of ResNet models
preprocess_layer = tf.keras.applications.resnet50.preprocess_input(input_layer)
# base model
pretrained_model = tf.keras.applications.ResNet50(include_top=False,
weights='imagenet')(preprocess_layer)
pretrained_model.trainable = True
# add layer
flatten_layer = L.GlobalAveragePooling2D()(pretrained_model)
# output layer
output_layer = keras.layers.Dense(NUM_CLASSES, activation='softmax')(flatten_layer)
# construct full model
model = keras.Model(input_layer, output_layer)
# compile the final model
model.compile(
optimizer = 'adam',
loss = 'categorical_crossentropy',
metrics = ['accuracy']
)
model.summary()
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/resnet/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5
94773248/94765736 [==============================] - 0s 0us/step
94781440/94765736 [==============================] - 0s 0us/step
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 512, 512, 3)] 0
tf.__operators__.getitem (S (None, 512, 512, 3) 0
licingOpLambda)
tf.nn.bias_add (TFOpLambda) (None, 512, 512, 3) 0
resnet50 (Functional) (None, None, None, 2048) 23587712
global_average_pooling2d (G (None, 2048) 0
lobalAveragePooling2D)
dense (Dense) (None, 5) 10245
=================================================================
Total params: 23,597,957
Trainable params: 23,544,837
Non-trainable params: 53,120
_________________________________________________________________
plot_model(model, to_file='/content/convnet.png', show_shapes=False, show_layer_names=True)
Image(filename='convnet.png')
While training, we might reach a "learning plateau". Since we don't want to overfit on the training data,
we'll use Early Stopping callback:
This callback will stop the training when there is no improvement in the validaion loss for 4 consecutive epochs.
early_stop = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=4, verbose=1, mode='min')
We'll use keras's default learning rate, which is 0.001
EPOCHS = 10
history = model.fit(train_ds,
validation_data=valid_ds,
epochs=EPOCHS,
callbacks=[early_stop],
shuffle=False)
Epoch 1/10 3511/3511 [==============================] - 869s 243ms/step - loss: 0.9017 - accuracy: 0.7335 - val_loss: 1.5783 - val_accuracy: 0.7360 Epoch 2/10 3511/3511 [==============================] - 851s 242ms/step - loss: 0.8797 - accuracy: 0.7347 - val_loss: 2.4711 - val_accuracy: 0.7360 Epoch 3/10 3511/3511 [==============================] - 850s 242ms/step - loss: 0.8774 - accuracy: 0.7347 - val_loss: 1.1178 - val_accuracy: 0.7360 Epoch 4/10 3511/3511 [==============================] - 850s 242ms/step - loss: 0.8726 - accuracy: 0.7347 - val_loss: 1.4275 - val_accuracy: 0.7360 Epoch 5/10 3511/3511 [==============================] - 849s 242ms/step - loss: 0.8690 - accuracy: 0.7347 - val_loss: 1.5213 - val_accuracy: 0.7360 Epoch 6/10 3511/3511 [==============================] - 850s 242ms/step - loss: 0.8648 - accuracy: 0.7347 - val_loss: 1.4968 - val_accuracy: 0.7360 Epoch 7/10 3511/3511 [==============================] - 850s 242ms/step - loss: 0.8620 - accuracy: 0.7347 - val_loss: 1.4781 - val_accuracy: 0.7360 Epoch 7: early stopping
Plot the training and validation data
def plot_history(history):
tr_acc = history.history['accuracy']
vl_acc = history.history['val_accuracy']
tr_loss = history.history['loss']
vl_loss = history.history['val_loss']
epochs = [i + 1 for i in range(len(tr_acc))]
index_loss = np.argmin(vl_loss) # this is the epoch with the lowest validation loss
vl_lowest = vl_loss[index_loss]
index_acc = np.argmax(vl_acc)
acc_highest = vl_acc[index_acc]
sc_label='best epoch= '+ str(index_loss + 1)
vc_label='best epoch= '+ str(index_acc + 1)
fig,axes = plt.subplots(1, 2, figsize=(16,8));
axes[0].plot(epochs, tr_loss, 'r', label='Training loss');
axes[0].plot(epochs, vl_loss, 'g', label='Validation loss' );
axes[0].scatter(index_loss + 1, vl_lowest, s=150, c='blue', label=sc_label);
axes[0].set_title('Training and Validation Loss');
axes[0].set_xlabel('Epochs');
axes[0].set_ylabel('Loss');
axes[0].grid(True)
axes[0].legend();
axes[1].plot(epochs, tr_acc, 'r', label='Training Accuracy');
axes[1].plot(epochs, vl_acc, 'g', label='Validation Accuracy');
axes[1].scatter(index_acc + 1, acc_highest, s=150, c='blue', label=vc_label);
axes[1].set_title('Training and Validation Accuracy');
axes[1].set_xlabel('Epochs');
axes[1].set_ylabel('Accuracy');
axes[1].grid(True)
axes[1].legend();
plot_history(history)
def display_confusion_matrix(y_true, y_pred, title):
cm = confusion_matrix(y_true, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=CLASS_NAMES)
fig, ax = plt.subplots(figsize=(10,10));
disp.plot(ax=ax, xticks_rotation=45);
ax.set_title(title);
def collect_predictions(model, data_gen):
predictions = np.array([])
# image, label pair batches
for x, y in tqdm(data_gen):
hot_1_guesess = model(x)
# predict labels on batch, returns 1-hot encoded vector
# convert 1-hot vectors to integers and append
predictions = np.concatenate((predictions, np.argmax(hot_1_guesess, axis=1)))
return predictions
# create the validation generator again to reset it
predictions = collect_predictions(model, valid_ds)
true_labels = df_valid['level'].values
100%|██████████| 878/878 [01:57<00:00, 7.49it/s]
print(classification_report(true_labels, predictions, target_names=CLASS_NAMES, zero_division=0))
display_confusion_matrix(true_labels, predictions, 'ResNet50 CNN - Confusion Matrix of Diabetic Retinopathy Images\n')
precision recall f1-score support
Healthy 0.74 1.00 0.85 5168
Mild 0.00 0.00 0.00 492
Moderate 0.00 0.00 0.00 1066
Proliferative 0.00 0.00 0.00 165
Severe 0.00 0.00 0.00 131
accuracy 0.74 7022
macro avg 0.15 0.20 0.17 7022
weighted avg 0.54 0.74 0.62 7022
We can see that the model predicted all the samples to be healthy. That is a terrible model. It did not catch any meaningful features.
Create a custom model based on existing one: EfficientNetB3
EfficientNet has proven to be a very good baseline for many computer vision tasks.
Further information on all sorts of architectures can be found here:
https://keras.io/api/applications/
Here is a graph that depicts of all kinds of architectures for image tasks:
https://www.tensorflow.org/api_docs/python/tf/keras/applications/efficientnet/EfficientNetB3
Note: each Keras Application expects a specific kind of input preprocessing.
For EfficientNet, input preprocessing is included as part of the model (as a Rescaling layer), and thus tf.keras.applications.efficientnet.preprocess_input is actually a pass-through function.
EfficientNet models expect their inputs to be float tensors of pixels with values in the [0-255] range.
Thus, we provide the norm=False argument to the class constructor:
BATCH_SIZE = 8
HEIGHT = 512
WIDTH = 512
CHANNELS = 3
NUM_CLASSES = 5
VALIDATION_SPLIT = 0.2
IMAGE_COUNT = 35108
CLASS_NAMES = ['Healthy', 'Mild', 'Moderate', 'Proliferative', 'Severe']
BETTER_IMAGES_PATH = '/content/better_images/content/better_images'
train_ds = CustomDataGen(df_train, BATCH_SIZE, BETTER_IMAGES_PATH, norm=False)
valid_ds = CustomDataGen(df_valid, BATCH_SIZE, BETTER_IMAGES_PATH, norm=False)
The overall idea of the architecture of the additional layers were taken from: https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/
A technique to reduce overfitting is to introduce dropout regularization to the network.
When you apply dropout to a layer, it randomly drops out (by setting the activation to zero) a number of output units from the layer during the training process.
Dropout takes a fractional number as its input value, in the form such as 0.1, 0.2, 0.4, etc.
This means dropping out 10%, 20% or 40% of the output units randomly from the applied layer.
Let's create a new neural network with tf.keras.layers.Dropout before training it using the augmented images:
tf.keras.backend.clear_session()
efficientnetb3 = tf.keras.applications.efficientnet.EfficientNetB3(
include_top=False,
weights='imagenet',
input_shape=(HEIGHT, WIDTH, CHANNELS)
)
efficientnetb3.trainable = True
model = tf.keras.Sequential()
model.add(efficientnetb3)
model.add(L.GlobalAveragePooling2D())
model.add(L.Dropout(0.5))
model.add(L.BatchNormalization())
model.add(L.Dense(NUM_CLASSES, activation='softmax'))
optimizer = keras.optimizers.Adam(learning_rate=1e-4, decay=1e-6)
model.compile(optimizer = optimizer,
loss='sparse_categorical_crossentropy',
metrics = ['accuracy'])
Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb3_notop.h5 43941888/43941136 [==============================] - 0s 0us/step 43950080/43941136 [==============================] - 0s 0us/step
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
efficientnetb3 (Functional) (None, 16, 16, 1536) 10783535
global_average_pooling2d (G (None, 1536) 0
lobalAveragePooling2D)
dropout (Dropout) (None, 1536) 0
batch_normalization (BatchN (None, 1536) 6144
ormalization)
dense (Dense) (None, 5) 7685
=================================================================
Total params: 10,797,364
Trainable params: 10,706,989
Non-trainable params: 90,375
_________________________________________________________________
plot_model(model, to_file='/content/convnet.png', show_shapes=False, show_layer_names=True)
Image(filename='convnet.png')
checkpoint = ModelCheckpoint('/content/efficientB3_model.tf', monitor='val_loss', verbose=1, save_best_only=True, mode='min')
learn_control = ReduceLROnPlateau(monitor='val_loss', patience=2, verbose=1, factor=0.5)
early_stop = EarlyStopping(monitor='val_loss', patience=2, verbose=1)
EPOCHS = 5
history = model.fit(train_ds,
validation_data=valid_ds,
epochs=EPOCHS,
callbacks=[learn_control, early_stop]
)
Epoch 1/5 3511/3511 [==============================] - 2038s 573ms/step - loss: 0.9162 - accuracy: 0.7642 - val_loss: 0.5857 - val_accuracy: 0.8483 - lr: 1.0000e-04 Epoch 2/5 3511/3511 [==============================] - 2007s 572ms/step - loss: 0.6352 - accuracy: 0.8077 - val_loss: 0.5121 - val_accuracy: 0.8335 - lr: 1.0000e-04 Epoch 3/5 3511/3511 [==============================] - 2007s 572ms/step - loss: 0.5346 - accuracy: 0.7992 - val_loss: 0.5089 - val_accuracy: 0.8149 - lr: 1.0000e-04 Epoch 4/5 3511/3511 [==============================] - 2003s 571ms/step - loss: 0.4549 - accuracy: 0.7854 - val_loss: 0.5203 - val_accuracy: 0.7770 - lr: 1.0000e-04 Epoch 5/5 3511/3511 [==============================] - ETA: 0s - loss: 0.3511 - accuracy: 0.7696 Epoch 5: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-05. 3511/3511 [==============================] - 2008s 572ms/step - loss: 0.3511 - accuracy: 0.7696 - val_loss: 0.5649 - val_accuracy: 0.8096 - lr: 1.0000e-04 Epoch 5: early stopping
model.save('/content/drive/MyDrive/efficientB3_model.tf', save_format='tf', include_optimizer=True)
INFO:tensorflow:Assets written to: /content/drive/MyDrive/efficientB3_model.tf/assets
plot_history(history)
Due to the decrease of the traning/validation accuracy we can infer that the model overfits.
The model overfits "towards" the healthy samples, because the dataset is imbalanced and these healthy samples are majority.
predictions = collect_predictions(model, valid_ds)
100%|██████████| 878/878 [02:46<00:00, 5.28it/s]
true_labels = df_valid['level'].values
print(classification_report(true_labels, predictions, target_names=CLASS_NAMES, zero_division=0))
precision recall f1-score support
Healthy 0.88 0.97 0.92 5150
Mild 0.37 0.22 0.28 504
Moderate 0.71 0.55 0.62 1063
Proliferative 0.57 0.44 0.50 175
Severe 0.77 0.48 0.59 130
accuracy 0.83 7022
macro avg 0.66 0.53 0.58 7022
weighted avg 0.81 0.83 0.81 7022
display_confusion_matrix(true_labels, predictions, 'EfficientNetB3 CNN - Confusion Matrix of Diabetic Retinopathy Images\n')
If we focus on the F1 score of each category in the report we may see that the model performs very well on the healthy samples, and quite decently on the moderate and severe samples.
This tool helps us to understand the predictions of the model based on the features that it focuses on.
model = tf.keras.models.load_model('/content/drive/MyDrive/efficientB3_model.tf')
Pickup some images from 3 main categories: healthy, moderate, severe
image_batch, label_batch = valid_ds[5]
label_batch[0:8]
array([2, 2, 1, 4, 0, 0, 0, 0])
x0 = image_batch[6]
x2 = image_batch[0]
x4 = image_batch[3]
fig, axis = plt.subplots(1, 3, figsize=(14,10));
axis[0].imshow(x0);
axis[0].set_title('Healthy');
axis[1].imshow(x2);
axis[1].set_title('Moderate');
axis[2].imshow(x4);
axis[2].set_title('Severe');
# python function to get model output; replace this function with your own model function.
def f(x):
# tmp = x.copy()
# preprocess_input(tmp)
# return model(tmp)
return model(x)
# define a masker that is used to mask out partitions of the input image.
masker = shap.maskers.Image("inpaint_telea", (HEIGHT, WIDTH, CHANNELS))
# create an explainer with model and image masker
explainer = shap.Explainer(f, masker, output_names=CLASS_NAMES)
# here we explain 1 image using 1000 evaluations of the underlying model to estimate the SHAP values
shap_values = explainer(np.array([x0]), max_evals=1000, batch_size=BATCH_SIZE, outputs=shap.Explanation.argsort.flip[:5])
Partition explainer: 2it [04:42, 282.86s/it]
shap.image_plot(shap_values)
The 5 graphs with the blue/red squares are the top 5 guesses of the model from left to right.
Each guess is "explained" here by the SHAP values. Each square indicate the "strength" of the feature in that area.
The stronger the feature the more red it is. The weaker the feature, the less affect it has on the probability to predict a specific category.
The SHAP values show us that the main features of the healthy sample here are the peripheral veins.
shap_values = explainer(np.array([x2]), max_evals=1000, batch_size=BATCH_SIZE, outputs=shap.Explanation.argsort.flip[:5])
shap.image_plot(shap_values)
Partition explainer: 2it [04:19, 259.34s/it]
Here, the SHAP values emphasize the damaged blood vessels and on some area that develops hard exudates.
shap_values = explainer(np.array([x4]), max_evals=1000, batch_size=BATCH_SIZE, outputs=shap.Explanation.argsort.flip[:5])
shap.image_plot(shap_values)
Partition explainer: 2it [04:31, 271.42s/it]
Here, their is strong focus on a specific area that contains a unique feature of the severe and proliferative categories: noticeable hard exudates and white cotton wool.
This time with a balanced train/validation set.
This model actially takes input images of shape (300, 300, 3), and the input data should range [0, 255].
Unlike in the previous attempt where we provided input of shape (512, 512, 3).
Normalization is included as part of the model.
https://keras.io/examples/vision/image_classification_efficientnet_fine_tuning/
We'll use the balanced dataset that we created early:
BATCH_SIZE = 16
HEIGHT = 300
WIDTH = 300
CHANNELS = 3
NUM_CLASSES = 5
CLASS_NAMES = ['Healthy', 'Mild', 'Moderate', 'Proliferative', 'Severe']
# notice we refer to the balanced dataset
split_index = int(len(balanced) * (0.8))
balanced_train = balanced.iloc[: split_index]
balanced_valid = balanced.iloc[split_index: ]
print('Train dataset length: ', len(balanced_train))
print('Valid dataset length: ', len(balanced_valid))
Train dataset length: 5104 Valid dataset length: 1276
train_ds = CustomDataGen(balanced_train,
BATCH_SIZE, BETTER_IMAGES_PATH,
norm=False,
resize=True, width=WIDTH, height=HEIGHT,
onehot=True)
valid_ds = CustomDataGen(balanced_valid,
BATCH_SIZE, BETTER_IMAGES_PATH,
norm=False,
resize=True, width=WIDTH, height=HEIGHT,
onehot=True)
for x, y in train_ds:
print(x.shape)
print(np.min(x), np.max(x))
break
(16, 300, 300, 3) 0 255
tf.keras.backend.clear_session()
b3 = tf.keras.applications.efficientnet.EfficientNetB3(
include_top=False,
weights='imagenet',
input_shape=(HEIGHT, WIDTH, CHANNELS)
)
b3.trainable = False # later we'll do a fine tuning with trainable True
model = tf.keras.Sequential()
model.add(b3)
model.add(L.GlobalAveragePooling2D())
model.add(L.Dropout(0.2))
model.add(L.BatchNormalization())
model.add(L.Dense(NUM_CLASSES, activation='softmax'))
optimizer = keras.optimizers.Adam(learning_rate=1e-3, decay=1e-5)
model.compile(optimizer = optimizer,
loss='categorical_crossentropy',
metrics = ['accuracy'])
Downloading data from https://storage.googleapis.com/keras-applications/efficientnetb3_notop.h5 43941888/43941136 [==============================] - 0s 0us/step 43950080/43941136 [==============================] - 0s 0us/step
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
efficientnetb3 (Functional) (None, 10, 10, 1536) 10783535
global_average_pooling2d (G (None, 1536) 0
lobalAveragePooling2D)
dropout (Dropout) (None, 1536) 0
batch_normalization (BatchN (None, 1536) 6144
ormalization)
dense (Dense) (None, 5) 7685
=================================================================
Total params: 10,797,364
Trainable params: 10,757
Non-trainable params: 10,786,607
_________________________________________________________________
checkpoint = ModelCheckpoint('/content/efficientB3_fine_model.tf', monitor='val_loss', verbose=1, save_best_only=True, mode='min')
learn_control = ReduceLROnPlateau(monitor='val_loss', patience=2, verbose=1, factor=0.5)
early_stop = EarlyStopping(monitor='val_loss', patience=3, verbose=1, restore_best_weights=True)
EPOCHS = 20
history = model.fit(train_ds,
validation_data=valid_ds,
epochs=EPOCHS,
callbacks=[learn_control, early_stop]
)
Epoch 1/20 319/319 [==============================] - 61s 133ms/step - loss: 1.6017 - accuracy: 0.3646 - val_loss: 1.2820 - val_accuracy: 0.4083 - lr: 0.0010 Epoch 2/20 319/319 [==============================] - 40s 125ms/step - loss: 1.4293 - accuracy: 0.4101 - val_loss: 1.2457 - val_accuracy: 0.4350 - lr: 0.0010 Epoch 3/20 319/319 [==============================] - 42s 132ms/step - loss: 1.3570 - accuracy: 0.4303 - val_loss: 1.2529 - val_accuracy: 0.4467 - lr: 0.0010 Epoch 4/20 319/319 [==============================] - 40s 124ms/step - loss: 1.3197 - accuracy: 0.4518 - val_loss: 1.2255 - val_accuracy: 0.4671 - lr: 0.0010 Epoch 5/20 319/319 [==============================] - 40s 125ms/step - loss: 1.2793 - accuracy: 0.4602 - val_loss: 1.2311 - val_accuracy: 0.4522 - lr: 0.0010 Epoch 6/20 319/319 [==============================] - 40s 125ms/step - loss: 1.2883 - accuracy: 0.4555 - val_loss: 1.2246 - val_accuracy: 0.4694 - lr: 0.0010 Epoch 7/20 319/319 [==============================] - 40s 126ms/step - loss: 1.2725 - accuracy: 0.4579 - val_loss: 1.2162 - val_accuracy: 0.4687 - lr: 0.0010 Epoch 8/20 319/319 [==============================] - 40s 125ms/step - loss: 1.2656 - accuracy: 0.4516 - val_loss: 1.2079 - val_accuracy: 0.4522 - lr: 0.0010 Epoch 9/20 319/319 [==============================] - 40s 124ms/step - loss: 1.2530 - accuracy: 0.4745 - val_loss: 1.2202 - val_accuracy: 0.4796 - lr: 0.0010 Epoch 10/20 319/319 [==============================] - 40s 125ms/step - loss: 1.2554 - accuracy: 0.4632 - val_loss: 1.2027 - val_accuracy: 0.4647 - lr: 0.0010 Epoch 11/20 319/319 [==============================] - 40s 125ms/step - loss: 1.2503 - accuracy: 0.4683 - val_loss: 1.1948 - val_accuracy: 0.4718 - lr: 0.0010 Epoch 12/20 319/319 [==============================] - 43s 134ms/step - loss: 1.2463 - accuracy: 0.4741 - val_loss: 1.1859 - val_accuracy: 0.4741 - lr: 0.0010 Epoch 13/20 319/319 [==============================] - 43s 135ms/step - loss: 1.2408 - accuracy: 0.4696 - val_loss: 1.2093 - val_accuracy: 0.4624 - lr: 0.0010 Epoch 14/20 319/319 [==============================] - 42s 130ms/step - loss: 1.2431 - accuracy: 0.4741 - val_loss: 1.1845 - val_accuracy: 0.4781 - lr: 0.0010 Epoch 15/20 319/319 [==============================] - 44s 137ms/step - loss: 1.2424 - accuracy: 0.4690 - val_loss: 1.1951 - val_accuracy: 0.4671 - lr: 0.0010 Epoch 16/20 319/319 [==============================] - ETA: 0s - loss: 1.2260 - accuracy: 0.4802 Epoch 16: ReduceLROnPlateau reducing learning rate to 0.0005000000237487257. 319/319 [==============================] - 42s 131ms/step - loss: 1.2260 - accuracy: 0.4802 - val_loss: 1.1954 - val_accuracy: 0.4781 - lr: 0.0010 Epoch 17/20 319/319 [==============================] - 44s 137ms/step - loss: 1.2099 - accuracy: 0.4753 - val_loss: 1.1610 - val_accuracy: 0.4851 - lr: 5.0000e-04 Epoch 18/20 319/319 [==============================] - 42s 132ms/step - loss: 1.1719 - accuracy: 0.4947 - val_loss: 1.1576 - val_accuracy: 0.4718 - lr: 5.0000e-04 Epoch 19/20 319/319 [==============================] - 40s 126ms/step - loss: 1.1747 - accuracy: 0.4980 - val_loss: 1.1726 - val_accuracy: 0.4702 - lr: 5.0000e-04 Epoch 20/20 319/319 [==============================] - ETA: 0s - loss: 1.1747 - accuracy: 0.4943 Epoch 20: ReduceLROnPlateau reducing learning rate to 0.0002500000118743628. 319/319 [==============================] - 39s 124ms/step - loss: 1.1747 - accuracy: 0.4943 - val_loss: 1.1653 - val_accuracy: 0.4694 - lr: 5.0000e-04
model.save('/content/drive/MyDrive/efficientB3_fine_model.tf', save_format='tf', include_optimizer=True)
INFO:tensorflow:Assets written to: /content/drive/MyDrive/efficientB3_fine_model.tf/assets
plot_history(history)
Do another round of training from were we finished, since it seems that 20 epochs were not enough and the model may still improve
EPOCHS = 30
new_history = model.fit(train_ds,
validation_data=valid_ds,
epochs=EPOCHS,
callbacks=[learn_control, early_stop]
)
Epoch 1/30 319/319 [==============================] - 42s 130ms/step - loss: 1.1467 - accuracy: 0.5000 - val_loss: 1.1653 - val_accuracy: 0.4788 - lr: 2.5000e-04 Epoch 2/30 319/319 [==============================] - 40s 125ms/step - loss: 1.1340 - accuracy: 0.5094 - val_loss: 1.1613 - val_accuracy: 0.4898 - lr: 2.5000e-04 Epoch 3/30 319/319 [==============================] - 44s 139ms/step - loss: 1.1475 - accuracy: 0.5059 - val_loss: 1.1685 - val_accuracy: 0.4765 - lr: 2.5000e-04 Epoch 4/30 319/319 [==============================] - ETA: 0s - loss: 1.1395 - accuracy: 0.5031 Epoch 4: ReduceLROnPlateau reducing learning rate to 0.0001250000059371814. 319/319 [==============================] - 44s 139ms/step - loss: 1.1395 - accuracy: 0.5031 - val_loss: 1.1612 - val_accuracy: 0.4906 - lr: 2.5000e-04 Epoch 5/30 319/319 [==============================] - 45s 140ms/step - loss: 1.1279 - accuracy: 0.5188 - val_loss: 1.1585 - val_accuracy: 0.4843 - lr: 1.2500e-04 Epoch 6/30 319/319 [==============================] - 42s 131ms/step - loss: 1.1200 - accuracy: 0.5200 - val_loss: 1.1549 - val_accuracy: 0.4945 - lr: 1.2500e-04 Epoch 7/30 319/319 [==============================] - 40s 125ms/step - loss: 1.1175 - accuracy: 0.5121 - val_loss: 1.1502 - val_accuracy: 0.4843 - lr: 1.2500e-04 Epoch 8/30 319/319 [==============================] - 40s 125ms/step - loss: 1.1293 - accuracy: 0.5094 - val_loss: 1.1518 - val_accuracy: 0.4882 - lr: 1.2500e-04 Epoch 9/30 319/319 [==============================] - 40s 126ms/step - loss: 1.1273 - accuracy: 0.5127 - val_loss: 1.1492 - val_accuracy: 0.4953 - lr: 1.2500e-04 Epoch 10/30 319/319 [==============================] - 40s 124ms/step - loss: 1.1023 - accuracy: 0.5161 - val_loss: 1.1508 - val_accuracy: 0.5024 - lr: 1.2500e-04 Epoch 11/30 319/319 [==============================] - ETA: 0s - loss: 1.1130 - accuracy: 0.5251 Epoch 11: ReduceLROnPlateau reducing learning rate to 6.25000029685907e-05. 319/319 [==============================] - 40s 125ms/step - loss: 1.1130 - accuracy: 0.5251 - val_loss: 1.1556 - val_accuracy: 0.4937 - lr: 1.2500e-04 Epoch 12/30 319/319 [==============================] - 40s 125ms/step - loss: 1.1178 - accuracy: 0.5192 - val_loss: 1.1478 - val_accuracy: 0.4976 - lr: 6.2500e-05 Epoch 13/30 319/319 [==============================] - 40s 125ms/step - loss: 1.1218 - accuracy: 0.5123 - val_loss: 1.1490 - val_accuracy: 0.4890 - lr: 6.2500e-05 Epoch 14/30 319/319 [==============================] - ETA: 0s - loss: 1.1019 - accuracy: 0.5221 Epoch 14: ReduceLROnPlateau reducing learning rate to 3.125000148429535e-05. 319/319 [==============================] - 40s 125ms/step - loss: 1.1019 - accuracy: 0.5221 - val_loss: 1.1478 - val_accuracy: 0.4890 - lr: 6.2500e-05 Epoch 15/30 319/319 [==============================] - 40s 124ms/step - loss: 1.1092 - accuracy: 0.5137 - val_loss: 1.1480 - val_accuracy: 0.4922 - lr: 3.1250e-05 Epoch 16/30 319/319 [==============================] - 40s 125ms/step - loss: 1.1176 - accuracy: 0.5106 - val_loss: 1.1476 - val_accuracy: 0.4922 - lr: 3.1250e-05 Epoch 17/30 319/319 [==============================] - 40s 125ms/step - loss: 1.1005 - accuracy: 0.5263 - val_loss: 1.1480 - val_accuracy: 0.4929 - lr: 3.1250e-05 Epoch 18/30 319/319 [==============================] - ETA: 0s - loss: 1.1093 - accuracy: 0.5163 Epoch 18: ReduceLROnPlateau reducing learning rate to 1.5625000742147677e-05. 319/319 [==============================] - 40s 124ms/step - loss: 1.1093 - accuracy: 0.5163 - val_loss: 1.1482 - val_accuracy: 0.4937 - lr: 3.1250e-05 Epoch 19/30 319/319 [==============================] - ETA: 0s - loss: 1.1074 - accuracy: 0.5184Restoring model weights from the end of the best epoch: 16. 319/319 [==============================] - 40s 126ms/step - loss: 1.1074 - accuracy: 0.5184 - val_loss: 1.1488 - val_accuracy: 0.4929 - lr: 1.5625e-05 Epoch 19: early stopping
# append new history
for metric, new_values in new_history.history.items():
history.history[metric].extend(new_values)
Graph of metrics of both fit calls:
plot_history(history)
Do a round of fine-tuning of the entire model.
Let's unfreeze the base model and train the entire model end-to-end with a low learning rate.
b3.trainable = True
model.compile(
optimizer = keras.optimizers.Adam(1e-5), # Low learning rate
loss = 'categorical_crossentropy',
metrics = ['accuracy']
)
EPOCHS = 10
fine_tune_history = model.fit(train_ds,
validation_data=valid_ds,
epochs=EPOCHS,
callbacks=[learn_control, early_stop]
)
Epoch 1/10 319/319 [==============================] - 154s 429ms/step - loss: 1.5122 - accuracy: 0.3611 - val_loss: 1.3694 - val_accuracy: 0.4091 - lr: 1.0000e-05 Epoch 2/10 319/319 [==============================] - 135s 424ms/step - loss: 1.2711 - accuracy: 0.4542 - val_loss: 1.3088 - val_accuracy: 0.4334 - lr: 1.0000e-05 Epoch 3/10 319/319 [==============================] - 135s 423ms/step - loss: 1.1458 - accuracy: 0.4996 - val_loss: 1.2769 - val_accuracy: 0.4459 - lr: 1.0000e-05 Epoch 4/10 319/319 [==============================] - 134s 421ms/step - loss: 1.0451 - accuracy: 0.5511 - val_loss: 1.2623 - val_accuracy: 0.4428 - lr: 1.0000e-05 Epoch 5/10 319/319 [==============================] - 134s 420ms/step - loss: 0.9684 - accuracy: 0.5931 - val_loss: 1.2483 - val_accuracy: 0.4530 - lr: 1.0000e-05 Epoch 6/10 319/319 [==============================] - 134s 421ms/step - loss: 0.8966 - accuracy: 0.6230 - val_loss: 1.2460 - val_accuracy: 0.4600 - lr: 1.0000e-05 Epoch 7/10 319/319 [==============================] - 137s 428ms/step - loss: 0.7612 - accuracy: 0.6942 - val_loss: 1.2584 - val_accuracy: 0.4663 - lr: 1.0000e-05 Epoch 9/10 319/319 [==============================] - ETA: 0s - loss: 0.7067 - accuracy: 0.7194 Epoch 9: ReduceLROnPlateau reducing learning rate to 4.999999873689376e-06. 319/319 [==============================] - 134s 420ms/step - loss: 0.7067 - accuracy: 0.7194 - val_loss: 1.2523 - val_accuracy: 0.4726 - lr: 1.0000e-05 Epoch 10/10 319/319 [==============================] - ETA: 0s - loss: 0.6447 - accuracy: 0.7667Restoring model weights from the end of the best epoch: 7. 319/319 [==============================] - 135s 424ms/step - loss: 0.6447 - accuracy: 0.7667 - val_loss: 1.2578 - val_accuracy: 0.4749 - lr: 5.0000e-06 Epoch 10: early stopping
model.save('/content/drive/MyDrive/efficientB3_fine_model.tf', save_format='tf', include_optimizer=True)
INFO:tensorflow:Assets written to: /content/drive/MyDrive/efficientB3_fine_model.tf/assets
plot_history(fine_tune_history)
Validation data predictions:
true_labels = balanced_valid['level'].values
predictions = collect_predictions(model, valid_ds)
print(classification_report(true_labels, predictions, target_names=CLASS_NAMES, zero_division=0))
display_confusion_matrix(true_labels, predictions, 'EfficientNetB3 CNN - Confusion Matrix of Diabetic Retinopathy Images\n')
100%|██████████| 80/80 [00:19<00:00, 4.16it/s]
precision recall f1-score support
Healthy 0.50 0.66 0.57 379
Mild 0.34 0.26 0.30 274
Moderate 0.42 0.28 0.34 304
Proliferative 0.51 0.55 0.53 181
Severe 0.52 0.61 0.56 138
accuracy 0.46 1276
macro avg 0.46 0.47 0.46 1276
weighted avg 0.45 0.46 0.45 1276
Now that's an interesting result we got there.
It seems that the model managed to comprehend some essential features of each category.
Let's observe SHAP values of this model to get explanation regarding the features that the model has learned:
image_batch, label_batch = valid_ds[0]
x0 = image_batch[13]
x2 = image_batch[0]
x4 = image_batch[1]
fig, axis = plt.subplots(1, 3, figsize=(14,10));
axis[0].imshow(x0); axis[0].set_title('Healthy');
axis[1].imshow(x2); axis[1].set_title('Moderate');
axis[2].imshow(x4); axis[2].set_title('Severe');
def f(x):
return model(x)
masker = shap.maskers.Image("inpaint_telea", (HEIGHT, WIDTH, CHANNELS))
explainer = shap.Explainer(f, masker, output_names=CLASS_NAMES)
shap_values = explainer(np.array([x0]), max_evals=1000, batch_size=BATCH_SIZE, outputs=shap.Explanation.argsort.flip[:5])
shap.image_plot(shap_values)
Partition explainer: 2it [01:29, 89.12s/it]
Although this sample was categorized correctly by the model, it is still not clear enough whether its a healthy sample. If you look closely you can see development of hard exudates around the upper blood vessels, but this may be false because of the poor image quality.
This is a good example that shows how much effect the image quality actually has.
shap_values = explainer(np.array([x2]), max_evals=1000, batch_size=BATCH_SIZE, outputs=shap.Explanation.argsort.flip[:5])
shap.image_plot(shap_values)
Partition explainer: 2it [01:30, 90.17s/it]
This sample was not categorized correctly according to the dataset, instead of moderate (in second place), it got proliferative. It is worth mentioning that this case just might realy be proliferative due to the condition if we look at it.
The model detected many features of the proliferative case.
shap_values = explainer(np.array([x4]), max_evals=1000, batch_size=BATCH_SIZE, outputs=shap.Explanation.argsort.flip[:5])
shap.image_plot(shap_values)
Partition explainer: 2it [01:27, 87.18s/it]
Here the model predicted severe correctly according to the profound development of white cotton wool spots, hard exudates and some abnormal and damaged veins.
We cannot perform a center crop of the images to 300x300 because we lose alot of information
on the edges of the eye, and this information
may be critical.
Therefore we will resize the image to 300x300 as before, but this time we'll use a differently balanced dataset.
The new balance of the dataset is as follows:
Unlike the previous dataset, this one includes much more moderate, mild, and healthy samples.
But not to many to not cause too much bias.
To feed the network with images of shape 300x300, instead of resizing from 512x512, we can perform a center crop to get the central fraction of the image without the black edges. This may be good for us because it gets rid of the bias that caused by it. (discussed above).
Let us see how this center crop affects the images:
# 300 / 512 gives ~0.58 which is ~298 pixels
l = os.listdir(BETTER_IMAGES_PATH)[0:16]
fig, axis = plt.subplots(4, 4, figsize=(20,20))
for i in range(4):
for j in range(4):
img = cv2.imread(BETTER_IMAGES_PATH + '/' + l[4*i + j]) # 512x512
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = tf.image.central_crop(img, central_fraction=0.58) # center crop
axis[i][j].imshow(img)
axis[i][j].axis('off')
print(img.shape)
(298, 298, 3)
The information n the edges disappears and that is not good enough for us because there may be critical features of sample that lay somewhere in the edges that may profoundly affect the case severity.
May be EfficientNetB6 that has input resolution of 528x528 may be a good attempt later ?
Define the new balanced dataset:
# sample k number of random samples of each category
balanced = pd.concat([sampling_k_elements(g0, k=5000).reset_index(drop=True),
sampling_k_elements(g1, k=2438).reset_index(drop=True),
sampling_k_elements(g2, k=4500).reset_index(drop=True),
sampling_k_elements(g3, k=872).reset_index(drop=True),
sampling_k_elements(g4, k=708).reset_index(drop=True)],
ignore_index=True)
balanced = balanced.sample(frac=1).reset_index(drop=True) # shuffle the dataframe
# notice we refer to the balanced dataset
split_index = int(len(balanced) * (0.8))
balanced_train = balanced.iloc[: split_index]
balanced_valid = balanced.iloc[split_index: ]
print('Train dataset length: ', len(balanced_train))
print('Valid dataset length: ', len(balanced_valid))
Train dataset length: 10814 Valid dataset length: 2704
BATCH_SIZE = 16
HEIGHT = 300
WIDTH = 300
CHANNELS = 3
NUM_CLASSES = 5
CLASS_NAMES = ['Healthy', 'Mild', 'Moderate', 'Proliferative', 'Severe']
train_ds = CustomDataGen(balanced_train,
BATCH_SIZE, BETTER_IMAGES_PATH,
norm=False,
resize=True, width=WIDTH, height=HEIGHT,
onehot=True)
valid_ds = CustomDataGen(balanced_valid,
BATCH_SIZE, BETTER_IMAGES_PATH,
norm=False,
resize=True, width=WIDTH, height=HEIGHT,
onehot=True)
tf.keras.backend.clear_session()
b3 = tf.keras.applications.efficientnet.EfficientNetB3(
include_top=False,
weights='imagenet',
input_shape=(HEIGHT, WIDTH, CHANNELS)
)
b3.trainable = True # later we'll do a fine tuning with False
model = tf.keras.Sequential()
model.add(b3)
model.add(L.GlobalAveragePooling2D())
model.add(L.Dropout(0.2))
model.add(L.BatchNormalization())
model.add(L.Dense(NUM_CLASSES, activation='softmax'))
optimizer = keras.optimizers.Adam(learning_rate=1e-4, decay=1e-6)
model.compile(optimizer = optimizer,
loss='categorical_crossentropy',
metrics = ['accuracy'])
plot_model(model, to_file='/content/convnet.png', show_shapes=False, show_layer_names=True)
Image(filename='convnet.png')
learn_control = ReduceLROnPlateau(monitor='val_loss', patience=5, verbose=1, factor=0.2, min_lr=1e-7)
early_stop = EarlyStopping(monitor='val_loss', patience=5, verbose=1, restore_best_weights=False) # to take the latest weights
EPOCHS = 50
history = model.fit(train_ds,
validation_data=valid_ds,
epochs=EPOCHS,
callbacks=[learn_control, early_stop]
)
Epoch 1/50 676/676 [==============================] - 301s 425ms/step - loss: 1.5373 - accuracy: 0.4091 - val_loss: 1.2222 - val_accuracy: 0.5181 - lr: 1.0000e-04 Epoch 2/50 676/676 [==============================] - 285s 421ms/step - loss: 1.0618 - accuracy: 0.5822 - val_loss: 1.1395 - val_accuracy: 0.5547 - lr: 1.0000e-04 Epoch 3/50 676/676 [==============================] - 284s 419ms/step - loss: 0.8054 - accuracy: 0.6811 - val_loss: 1.1530 - val_accuracy: 0.5695 - lr: 1.0000e-04 Epoch 4/50 676/676 [==============================] - 283s 419ms/step - loss: 0.5287 - accuracy: 0.8010 - val_loss: 1.3243 - val_accuracy: 0.5381 - lr: 1.0000e-04 Epoch 5/50 676/676 [==============================] - 283s 419ms/step - loss: 0.3292 - accuracy: 0.8795 - val_loss: 1.4387 - val_accuracy: 0.5510 - lr: 1.0000e-04 Epoch 6/50 676/676 [==============================] - 285s 421ms/step - loss: 0.2256 - accuracy: 0.9186 - val_loss: 1.6122 - val_accuracy: 0.5366 - lr: 1.0000e-04 Epoch 7/50 676/676 [==============================] - ETA: 0s - loss: 0.1585 - accuracy: 0.9454 Epoch 7: ReduceLROnPlateau reducing learning rate to 1.9999999494757503e-05. 676/676 [==============================] - 285s 421ms/step - loss: 0.1585 - accuracy: 0.9454 - val_loss: 1.8122 - val_accuracy: 0.5573 - lr: 1.0000e-04 Epoch 7: early stopping
model.save('/content/efficientB3_trainable_first_model.tf')
INFO:tensorflow:Assets written to: /content/efficientB3_trainable_first_model.tf/assets
The valiation loss kept rising, so early stop was activated.
plot_history(history)
The valiation loss increases du to overfitting towards the healthy samples over time. (epochs)
Now let's see the confusion matrix on the validation set
true_labels = balanced_valid['level'].values
predictions = collect_predictions(model, valid_ds)
100%|██████████| 169/169 [00:40<00:00, 4.22it/s]
print(classification_report(true_labels, predictions, target_names=CLASS_NAMES, zero_division=0))
display_confusion_matrix(true_labels, predictions, 'EfficientNetB3 CNN - Confusion Matrix of Diabetic Retinopathy Images\n')
precision recall f1-score support
Healthy 0.62 0.68 0.65 992
Mild 0.33 0.27 0.30 523
Moderate 0.59 0.61 0.60 861
Proliferative 0.50 0.37 0.42 171
Severe 0.60 0.63 0.61 157
accuracy 0.56 2704
macro avg 0.53 0.51 0.52 2704
weighted avg 0.55 0.56 0.55 2704
It is quite similar to the previous attempt, where we first trained.
Now we shall try to perform predictions of a new test set.
This test set is equaly balanced (708 samples for each class) and the samples are randomly
chosen from the entire samples dataset. (The original dataset).
test_df = pd.concat([sampling_k_elements(g0, k=708).reset_index(drop=True),
sampling_k_elements(g1, k=708).reset_index(drop=True),
sampling_k_elements(g2, k=708).reset_index(drop=True),
sampling_k_elements(g3, k=708).reset_index(drop=True),
sampling_k_elements(g4, k=708).reset_index(drop=True)],
ignore_index=True)
test_df = test_df.sample(frac=1).reset_index(drop=True) # shuffle the dataframe
print('Length of test dataset: ', len(test_df))
Length of test dataset: 3540
test_ds = CustomDataGen(test_df,
batch_size=1, path=BETTER_IMAGES_PATH, # notice batch size 1
norm=False,
resize=True, width=WIDTH, height=HEIGHT,
onehot=True)
true_labels = test_df['level'].values
predictions = collect_predictions(model, test_ds) # predict on every single image
100%|██████████| 3540/3540 [09:08<00:00, 6.45it/s]
print(classification_report(true_labels, predictions, target_names=CLASS_NAMES, zero_division=0))
display_confusion_matrix(true_labels, predictions, 'EfficientNetB3 CNN - Test Set Confusion Matrix of Diabetic Retinopathy Images\n')
precision recall f1-score support
Healthy 0.77 0.72 0.74 708
Mild 0.78 0.82 0.80 708
Moderate 0.72 0.84 0.78 708
Proliferative 0.95 0.85 0.90 708
Severe 0.95 0.91 0.93 708
accuracy 0.83 3540
macro avg 0.83 0.83 0.83 3540
weighted avg 0.83 0.83 0.83 3540
Looks marvelous !!!
The model succesfully predicted each category with minor error for our given test set.
We shall perform a prediction on the entire 35108 samples dataset and draw its confusion matrix and report.
all_ds = CustomDataGen(df,
batch_size=1, path=BETTER_IMAGES_PATH,
norm=False,
resize=True, width=WIDTH, height=HEIGHT,
onehot=True)
true_labels = df['level'].values
predictions = collect_predictions(model, all_ds)
100%|██████████| 35108/35108 [1:28:59<00:00, 6.57it/s]
print(classification_report(true_labels, predictions, target_names=CLASS_NAMES, zero_division=0))
display_confusion_matrix(true_labels, predictions, 'EfficientNetB3 CNN - Prediction on the Whole Dataset\nConfusion Matrix of Diabetic Retinopathy Images\n')
precision recall f1-score support
Healthy 0.96 0.73 0.83 25802
Mild 0.32 0.80 0.45 2438
Moderate 0.59 0.85 0.69 5288
Proliferative 0.78 0.84 0.81 872
Severe 0.73 0.91 0.81 708
accuracy 0.76 35108
macro avg 0.68 0.83 0.72 35108
weighted avg 0.85 0.76 0.78 35108
The F1 scores show us a pretty good stats. Above 80% prediction success rate for
the Proliferative, Severe, and Healthy samples.
Notice that it does a very good job distinguishing between Proliferative and Severe casses.
Only 23 Proliferative samples were diagnosed to be Severe and only 15 Severe samples were diagnosed as Proliferative.
But!, even though the model did not predict impressively for the Mild and Moderate samples, it is worth noticing that it predicted about 7000 healthy samples as Mild or Moderate. (First row).
This model prefers (most of the times) to predict a false positive DR diagnosis for Healthy samples rather than predict-
a true negative for actual Mild/Moderate casses.
The low F1 score for Mild samples can be explained by the poor quality of the original images. Some healthy images are come in poor quality in the first place, so that our image processing technique wouldn't be able to enhance any slightly profound features.
As the saying: "Its not the dancer, its the curved floor". 😏
Even the diagnosis of the doctors may be in-precise in some samples. It's a long and complicated job to figure out which samples are inconsistent with the correct diagnosis.
The bottom line is that we possess a CNN model that is capble to categorize DR samples with "good" precision.
image_batch, label_batch = valid_ds[7]
x0 = image_batch[8]
x2 = image_batch[0]
x4 = image_batch[3]
fig, axis = plt.subplots(1, 3, figsize=(14,10));
axis[0].imshow(x0); axis[0].set_title('Healthy'); axis[0].axis('off');
axis[1].imshow(x2); axis[1].set_title('Moderate'); axis[1].axis('off');
axis[2].imshow(x4); axis[2].set_title('Severe'); axis[2].axis('off');
def f(x):
return model(x)
masker = shap.maskers.Image("inpaint_telea", (HEIGHT, WIDTH, CHANNELS))
explainer = shap.Explainer(f, masker, output_names=CLASS_NAMES)
shap_values = explainer(np.array([x0]), max_evals=4000, batch_size=BATCH_SIZE, outputs=shap.Explanation.argsort.flip[:3]);
shap.image_plot(shap_values);
Partition explainer: 2it [05:48, 348.20s/it]
The above SHAP values mainly focus the healthy tissue near the Fovea (the central cirle) and the healthy blood vessels.
shap_values = explainer(np.array([x2]), max_evals=4000, batch_size=BATCH_SIZE, outputs=shap.Explanation.argsort.flip[:3]);
shap.image_plot(shap_values);
Partition explainer: 2it [06:00, 360.52s/it]
The SHAP values focus on the abnormal blood vessels and aneurysms.
shap_values = explainer(np.array([x4]), max_evals=4000, batch_size=BATCH_SIZE, outputs=shap.Explanation.argsort.flip[:3]);
shap.image_plot(shap_values);
Partition explainer: 2it [06:04, 364.15s/it]
The above SHAP values show us the profound features of severe and proliferative DR such as: "Cotton Wool" spots and abnormal blood vessels.
Classic Sklearn algorithms preformance on HOG vectors
| SVC RBF Kernel | SGD | MultinomialNB | |
|---|---|---|---|
| Macro Average F1-score | 0.22 | 0.2 | 0.19 |
CNN preformance on images
| ResNet50 - attempt 1 | EfficientB3 - attempt 2 | EfficientB3 - attempt 3 | EfficientB3 - attempt 4 | |
|---|---|---|---|---|
| Macro Average F1-score | 0.17 | 0.58 | 0.46 | 0.72 |
Due to time constraints I did not manage to implement and check these ideas.
I knew that tackling the Diabetic Retniopathy problem was not an easy task what so ever.
Just by looking at the data and examining all the categories, I could barely notice the subtle differences between neighboring DR severity levels. 2 and 3 looked very similar, 3 and 4 looked the same.
Therefore it sparked me with genuine interest on whether a machine learning or deep learning model will be able to distinguish between the different features.
Along the path to reach the good model from above (Last attempt with EfficientNetB3) I encountered many difficulties and challenges.
This course is actually an intro for me to the vast world of data science, machine learning and deep learning.
Most of the concepts that are presented throughout this notebook were completely new for me, and I learned them along the course.
I read major parts of this book: "Deep Learning" by Ian Goodfellow, and others, and also watched Prof. Alex Bronstein's Youtube series: "Deep Learning on Computational Accelerators".
These materials gave me a solid ground knowledge to begin tackling the problem with sufficient understanding of what I'm doing and what I want to acheive.
The course provided me a window to the actual real world tools with good explanations, so I am greatful for that.
Using Google's colab envioroment was very helpful. I enjoyed it. The envioroment is very intuitive and integrates great with Google Drive.
I'm actually pretty impressed with myself that I manged to pull this of.
While enlisting to this course I was aware of the knowledge gap, and decided to take the challenge upon myself, because I believe in my learning abilities.
I think it turned out well for me and I can say that this course was one of the most instructive courses I had during my studies.